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CAPTURE COMPOUNDS, COLLECTIONS THEREOF AND METHODS FOR 
ANALYZING THE PROTEOME AND COMPLEX COMPOSITIONS 

RELATED APPLICATIONS 

Benefit of priority is claimed to U.S. provisional application Serial 
No. 60/306,019, filed July 16, 2001, to Koster, entitled "COMPOUNDS 
AND METHODS FOR ANALYZING THE PROTEOME;" to U.S. provisional 
5 application Serial No. 60/314,123, filed August 21, 2001, to Koster, 
entitled "COMPOUNDS AND METHODS FOR ANALYZING THE 
PROTEOME;" and to U.S. provisional application Serial No. 60/363,433, 
filed March 11, 2002, to Koster et a/., entitled "COMPOUNDS AND 
METHODS FOR ANALYZING THE PROTEOME." This application is also 
10 related to U.S. utility application No. (attorney docket no. 24743-2305), 
filed July 16, 2002. 

The disclosures of the each of above-referenced provisional patent 
applications and U.S. utility application is incorporated herein by reference 
in its entirety. 
15 FIELD 

Provided herein are compounds and methods using the compounds 
to specifically and selectively analyze biomolecules. In particular, the 
compounds and methods are useful for analyzing the proteome. 
BACKGROUND 

20 Understanding the basis of disease and the development of 

therapeutic and preventative treatments has evolved over the last century 
from empirical observation and experimentation to genome wide mutation 
scanning. The revolution in genomics has provided researchers with the 
tools to look for a genomic basis for disease. The Human Genome effort 

25 has generated a raw sequence of the 3 billion base pairs of the human 
genome and revealed about 35,000 genes. Genetic variations amongst 
different individuals and in and in between populations are being studied 
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in order to determine the association with the predisposition to disease or 
the correlation to drug efficacy and/or side effects. The promise of 
personalized medicine based on a panel of genetic markers has tantalized 
the healthcare community and provides an important goal for those 
5 focused on providing diagnostic and treatment options for healthcare 
providers and patients. 

With the development of a varitey tools in molecular biology, such 
as nucleic amplification methods, cloning and expression systems and 
methods, disease analysis has been based on a genomics or bottom up 

10 approach. This is approach presumes that a genetic change or set of 

changes will have a long reaching effect on protein function by affecting 
mRNA transcription or protein structure and function. 

Technologies have been developed to analyze single nucleotide 
polymorphisms (SNPs) in an industrial scale (e.g., MassARRAY™ and the 

15 MassARRAY® system, Sequenom, Inc., San Diego, CA) and in pooled 

samples to study the frequency of SNPs in populations of various gender, 
ethnicity, age and health condition. The ultimate goal of these efforts is 
to understand the etiology of disease on the molecular level (e.g., based 
on genetic variances (pharmacogenomics)), to develop diagnostic assays 

20 and effective drugs with few or no side-effects. 

Genomics has fallen short of the original expectation that this 
strategy could be used to stratify a population relative to a defined 
phenotype, including differences between normal and disease patient 
population or population. Although single genetic markers have been 

25 found to be associated with or cause or predict a specific disease state, 
genomic information may not be sufficiient to stratify individual 
populations by of the association of an SNP (or SNPs) with a given 
disease, drug side-effect or other target phenotype. Because of the large 
number of potential targets and regulatory signals that affect protein 
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translation, it is not sufficient to establish the differential expression 
profiles of messenger RNA in comparing phenotypes or populations, such 
as healthy and disease states, such as the analyses using expression 
DNA chips (e.g., GeneChip" technology, Affymetrix, Inc., Santa Clara, 
5 CA; LifeArray" technology, Incyte Genomics, Inc., Palo Alto, CA). The 
metabolic activities in a cell are not performed by mRNA but rather by the 
translated proteins and subsequently posttranslationally modified 
products, such as the alkylated, glycosylated and phosphorylated 
products. 

10 The study of proteomics encompasses the study of individual 

proteins and how these proteins function within a biochemical pathway. 
Proteomics also includes the study of protein interactions, including how 
they form the architecture that constitutes living cells. In many human 
diseases such as cancer, Alzheimer's disease, diabetes as well as host 

15 responses to infectious diseases, the elucidation of the complex 

interactions regulatory proteins, which can cause diseases, is a critical 
step to finding effective treatment. Often, SNPs and other nucleic acid 
mutations occur in genes whose products are such proteins as (1) growth 
related hormones, (2) membrane receptors for growth hormones, 

20 (3) components of the trans-membrane signal pathway and (4) DNA 
binding proteins that act on transcription and the inactivation of 
suppressor genes [e.g. P53) causing the onset of disease. 

Complex protein mixtures are analyzed by two-dimensionsl (2D) gel 
electrophoresis and subsequent image processing to identify changes in 

25 the pattern (structural changes) or intensity of various protein spots. 
Two-dimensionsl gel electrophoresis is a laborious, error-prone method 
with low reproducibility and cannot be effectively automated. This gel 
technology is unable to effectively analyze membrane proteins. Further, 
the resolution of 2D gels is insufficient to analyze the profile of all 
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proteins present in a mixture. 

Available protein chips are limited by their ability to specifically 
capture hydrophobic and membrane proteins, which frequently targets of 
drug development. Once bound to the chip, proteins are highly unstable 
5 and their structures often do not reflect the true conformation found 
under physiological conditions. 

Thus, there is a need to develop technologies for analysis of the 
proteome that allow scaling up to industrial levels with the features of an 
industrial process: high accuracy, reproducibility and flexibility in that the 

10 process is of high-throughput, automatable and cost-effective. There is a 
need to develop technologies that permit probing and identification of 
proteins and other biomolecules in their native conformation using 
automated protocols and systems therefor. In particular, there is a need 
to develop strategies and technologies for identification and 

15 characterization of hydrophobic proteins under physiological conditions. 
Therefore, among the objects herein, it is an object herein to provide such 
technologies. 
SUMMARY 

Provided herein are methods, capture compounds (also referred to 
20 herein as capture agents) and collections thereof for analysis of the 
proteome on an industrial level in a high throughput format. The 
methods, capture compounds and collections permit sorting of complex 
mixtures of biomolecules. In addition, they permit identification of protein 
structures predicative or indicative of specific of phenotypes, such as 
25 disease states, thereby eliminating the need for random SNP analysis, 
expression profiling and protein analytical methods. The capture 
compounds, collections and methods sort complex mixtures by providing 
a variety of different capture agents. In addition, they can be used to 
identify structural "epitopes" that serve as markers for specific disease 
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states, stratify individual populations relative to specific phenotypes, 
permit a detailed understanding of the proteins underlying molecular 
function, and provide targets for drug development. The increased 
understanding of target proteins permit the design of higher efficiency 
5 therapeutics. 

Capture compounds, collections of the compounds and methods 
that use the compounds, singly or in collections thereof, to capture, 
separate and analyze biomolecules, including, but not limited to, mixtures 
of biomolecules, including biopolymers and macromolecules, such as 

10 proteins, individual biomolecules, such as proteins, including individual or 
membrane proteins, are provided. The collections contain a plurality, 
generally at least two, three, ypically at least 10, 50, 100, 1000 or more 
different capture compounds. The compounds and collections are 
designed to permit probing of a mixture of biomolecules by virtue of 

15 interaction of the capture compounds in the collection with the 

components of the a mixture under conditions that preserve their three- 
dimensional configuration. Each member of the collection is designed 
1) to bind, either covalently or other chemical interaction with high 
binding affinity (k a ) such that the binding is irreversible or stable under 

20 conditions of mass spectrometric analysis) to fewer than all, typically 
about 5 to 20 or more component biomolecules in a mixture, depending 
upon complexity and diversity of the mixtuer, under physiological 
conditions, including hydrophobic conditions, and 2) distinguish among 
biomolecules based upon topological features. In addition, the capture 

25 compounds generally include a group, such as a single-stranded 

oligonucleotide or partially single-stranded oligonucleotide, that permits 
separation of each set of capture compounds. 

The capture compounds and collections are used in a variety of 
methods, but are particularly designed for assessing biomolecules, such 
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as biopolymer, components in mixtures from biological samples. The 
collections are used in top-down unbiased methods that assess structural 
changes, including post-translational structural changes and, for example, 
are used to compare patterns, particularly post-translational protein 
5 patterns, in diseased versus healthy cells from primarly cells generally 
from the same individual. The cells that serve as the sources of 
biomolecules can be frozen into a selected metabolic state or 
synchronized to permit direct comparison and identification of phenotype- 
specific, such as disease-specific biomolecules, generally proteins. 

10 A capture compound includes at a chemical reactivity group X (also 

referrred to herein as a function or a functionality), which effects the 
covalent or a high binding affinity (high k a ) binding, and least one of three 
other groups (also referred to herein as functions or funtionalities). The 
other groups are selected from among a selectivity function Y that 

15 modulates the interaction of a biomolecule with the reactivity function, a 
sorting function Q for addressing the components of the collection, and a 
solubility function W that alters solubility of the capture compound, such 
as by increasing he solubility of the capture compound under selected 
conditions, such as various physiological conditions, including 

20 hydrophobic conditions of cell membranes. Hence, for example, if 
membrane proteins are targeted, then the capture compounds in the 
collection are designed with solubility functions that increase or provide 
for solubility in such environment. 

For example, the reactivity group (reactivity function) includes 

25 groups that specifically react or interact with functionalities on the 
surface of a protein such as hydroxyl, amine, amide, sulfide and 
carboxylic acid groups, or that recognize specific surface areas, such as 
an antibody, a lectin or a receptor-specific ligand, or interacts with the 
active site of enzymes. Those skilled in the art can select from a library 
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of functionalities to accomplish this interaction. While this interaction 
can be highly reaction-specific, these compounds can react multiple times 
within the same protein molecule depending on the number of surface- 
accessible functional groups. Modification of the reaction conditions 
5 allows the identification of surface accessible functional groups with 

differing reactivity, thereby permitting identification of one or more highly 
reactive sites used to separate an individual protein from a mixture. 
Available technologies do not separate species in the resulting reaction 
mixture. The collections and compounds provided herein solve that 

10 prdoblem through a second functionality, the selectivity group, which 
alters binding the reactivity groups to the biomolecule. 

Selectivity functions include a variety of groups, as well as the 
geometric spacing of the second functionality, a single stranded 
unprotected or suitably protected oligonucleotide or oligonucleotide 

15 analog. The selective functionality can be separate from the compound 
and include the solid or semi-solid support. The selective functionality in 
this embodiment can be porosity, hydrophobicity, charge and other 
chemical properties of the material. For example, selectivity functions 
interact noncovalently with target proteins to alter the specificity or 

20 binding of the reactivity function. Such functions include chemical 
groups and biomolecules that can sterically hinder proteins of specific 
size, hydrophilic compounds or proteins {e.g., PEG and trityls), 
hydrophobic compounds or proteins (e.g., polar aromatic, lipids, 
glycolipids, phosphotrtester, oligosaccharides), positive or negatively 

25 charged groups, groups or biomolecules which create defined secondary 
or tertiary structure. 

The capture compounds can also include a sorting function for 
separation or addressing of each capture compounds according to its 
structure. The sorting function, for example, can be a single-stranded (or 
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partially single-stranded) unprotected or suitably protected oligonucleotide 
or oligonucleotide analog, typically containing between at least about 5 
and up to 25, 35, 50, 100 or any desired number of nucleotides (or 
analogs thereof) containing a sequence permuted region and optionally 
5 flanking regions. Each such block has a multitude of sequence 
permutations with or without flanking conserved regions, which is 
capable of hybridizing with a base-complementary single stranded nucleic 
acid molecule or a nucleic acid analog. The sorting function can also be 
a label, such as a symbology, including a bar code, particularly a 

10 machine-readable bar code, a color coded-label, such as small colored 
bead that can be sorted by virtue of its color, an radio-frequncy tag or 
other electronic label or a chemical label. Any functionality that permits 
sorting of each set of capture compounds to permit separate analysis of 
bound biomolecules is contemplated. 

15 In certain embodiments, each biomolecule to be captured is 

derivatized with more than one capture compound provided herein, where 
each tagged compound provides an additional level of sorting capability. 
In other embodiments, each of the plurality of compounds that derivatize 
a single biomolecule is different, allowing for specific and efficient sorting 

20 of the biomolecule mixture (see, e.g., Figure 3). The capture compound 
also can be multifunctional containing other functionalities that can be 
used to reduce the complexity of biomolecule mixtures. 

Some of the capture compounds include at least a reactivity 
function and a selectivity function. These capture compounds optionally 

25 include sorting functionalities, which are one or more additional moieties 
that bind either covalently or noncovalently to a specific molecules to 
permit addressing of the compounds, such as by separation at discrete 
loci on a solid support, separation of the compounds on discrete loci. 
These capture compounds also optionally include one or more solubility 
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functions, which are moieties that influence the solubility of the resulting 
compound, to attenuate or alter the hydrophobicity/hydrophilicity of the 
compounds (solubility function). 

Others of the capture compounds (or capture agents) include at 
5 least two functional portions: a reactivity function and a sorting function. 
The reactive group that specifically interacts with proteins or other 
biomolecules (reactivity function); and the other is an entity (sorting 
functions) that binds either covalently or noncovalently to a specific 
molecule(s). This entity can be a nucleic acid portion or nucleic acid 
10 analog portion that includes a single-stranded region that can specifically 
hybridize to a complementary single-stranded oligonucleotide or analog 
thereof. 

The capture compounds are provided as collections, generally as 
collections of sets of different compounds that differ in all functionalities. 
15 For sorting of complex mixtures of biopolymers the collection includes 
diverse capture compound members so that, for example, when they are 
arrayed, each locus of the array contains 0 to 100, generally, 5 to 50 and 
desirably 1 to 20, typically 5 to 20, different biomolecules at each locus 
in the array. 

20 In practice in one embodiment, a collection of capture compounds 

is contacted with a biomolecule mixture and the bound molecules are 
assessed using, for example, mass spectrometry, followed by optional 
application of tagging, such as fluorescence tagging, after arraying to 
identify low abundance proteins. In other embodiments, a single capture 

25 compound is contacted with one or plurality of biomolecules, and the 
bound molecules are assessed. 

Also provided herein are methods for the discovery and 
identification of proteins, which are selected based on a defined 
phenotype. The methods allow proteins to bind to the target molecules 
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under physiological conditions while maintaining the correct secondary 
and tertiary conformation of the target. The methods can be performed 
under physiological and other conditions that permit discovery of 
bioglogically important proteins, including membrane proteins, that are 
5 selected based upon a defined phenotype. 

Before, during or after exposure of one or a plurality of capture 
compounds to a mixture of biomolecules, including, but not limited to, a 
mixture of proteins, the oligonucleotide portion, or analog thereof, of 
these compounds is allowed to hybridize to a complementary strand of 

10 immobilized oligonucleotide(s), or analog(s) thereof, to allow separation, 
isolation and subsequent analysis of bound biomolecules, such as 
proteins, by, for example, mass spectrometry, such as matrix assisted 
laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry, 
colorometric, fluorescent or chemiluminescent tagging, or to allow for 

15 increased resolution by mass spectrometry, including MALDI-TOF mass 
spectrometry. 

The collections of capture compounds can be used to generate 
compound arrays to capture target proteins or groups related proteins 
that can mimic biological structures such as nuclear and mitochondrial 

20 transmembrane structures, artificial membranes or intact cell walls. 

Thus, the compounds and compound arrays provided herein are capable 
of mimicking biological entities and biological surfaces, thereby allowing 
for capture of biomolecules, including but not limited to proteins, which 
would otherwise be difficult or impossible to capture, such as those 

25 found in transmembrane regions of a cell. 

Samples for analysis include any biomolecules, particularly protein- 
containing samples, such as protein mixtures, including, but not limited 
to, natural and synthetic sources. Proteins can be prepared by 
translation from isolated chromosomes, genes, cDNA and genomic 
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libraries. Proteins can be isolated from cells, and other sources. In 
certain embodiments, the capture compounds provided herein are 
designed to selectively capture different post-translational modifications 
of the same protein (i.e., phosphorylation patterns (e.g., oncogenes), 
5 glycosylation and other post-translational modifications). 

Other methods that employ the collections are also provided. In 
one method, the collections or one or more member capture compounds 
are used to distinguish between or among different conformations of a 
protein and, for example, can be used for phenotypic identification, such 
10 as for diagnosis.. For example, diseases of protein aggregation, which 
are disease involving a conformational^ altered protein, such as amyloid 
diseases, the collections can distingush between the disease-involved 
form of the protein from the normal protein and thereby diagnose the 
disease in a sample. 
15 BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 shows the hybridization, separation and mass spectral 
analysis of a mixture of proteins. 

Figure 2 provides a schematic depiction of one embodiment of the 
apparatus provided herein. 
20 Figure 3 illustrates a protein tagged with four compounds provided 

herein, thereby allowing for specific sorting of the protein. 

Figure 4 shows the increased and specific hybridization resulting 
from use of two or more oligonucleotide tags. 

Figure 5 shows tagging of a single protein with two different 
25 oligonucleotides in one reaction. 

Figure 6 is a flow diagram of recombinant protein production. 

Figure 7 illustrates production of an adapted oligonucleotide dT 
primed cDNA library. 

Fig ure 8 shows production of an adapted sequence motif specific 
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cDNA library. 

Figure 9 shows production of an adapted gene specific cDNA. 
Figure 10 illustrates purification of amplification products from a 
template library. 

5 Figure 1 1 shows an adapted oligonucleotide dT primed cDNA 

libraray as a universal template for the amplification of gene 
subpopulations. 

Figure 12 illustrates decrease of complexity during PCR 
amplification. 

10 Figure 13 shows the attachment of a bifunctional molecule to a 

solid surface. 

Figure 14 shows analysis of purified proteins from compound 
screening and antibody production. 

Figure 1 5 provides synthetic schemes for synthesis of exemplary 
15 capture reagents provided herein (see, e.g., Example 4). 

Figure 1 6 provides exemplary reactivity functions for use in the 
capture reagents provided herein. 

Figure 17 provides exemplary selectivity functions for use in the 
capture reagents provided herein. 
20 Figure 18 depicts exemplary points for regulation of metabolic 

control mechanisms for cell synchronization. 

Figures 19 depict cell separation and synchroniztion method; Figure 
19a depicts methods for separation of cells from blood from a single 
patient to separate them by phenotype; Figure 19b shows the results of 
25 flow cytometry separation of blood cells without labeling; Figure 19c 
shows an example in which synchronized cells in culture are sorted 
accroding to DNA content as a way to separation cells by phase of the 
cell cycle. 

Figure 20 shows a schematic of a biomolecule capture assay and 
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results using exemplary capture compounds and proteins. 
DETAILED DESCRIPTION 
A. Definitions 

Unless defined otherwise, all technical and scientific terms used 
5 herein have the same meaning as is commonly understood by one of skill 
in the art to which the invention(s) belong. All patents, patent 
applications, published applications and publications, Genbank sequences, 
websites and other published materials referred to throughout the entire 
disclosure herein, unless noted otherwise, are incorporated by reference 

10 in their entirety. In the event that there are a plurality of definitions for 
terms herein, those in this section prevail. Where reference is made to a 
URL or other such indentifier or address, it understood that such 
identifiers can change and particular information on the internet can come 
and go, but equivalent information can be found by searching the 

15 internet. Reference thereto evidences the availability and public 
dissemination of such information. 

As used herein, an oligonucleotide means a linear sequence of up 
to about 20, about 50, or about 100, nucleotides joined by 
phosphodiester bonds. Above this length the term polynucleotide begins 

20 to be used. 

As used herein, an oligonucleotide analog means a linear sequence 
of up to about 20, about 50, or about 100, nucleotide analogs, or linear 
sequence of up to about 20, about 50, or about 100 nucleotides linked 
by a "backbone" bond other than a phosphodiester bond, for example, a 
25 phosphotriester bond, a phosphoramidate bond, a phophorothioate bond, 
a methylphosphonate diester bond, a thioester bond, or a peptide bond 
(peptide nucleic acid). 

As used herein, peptide nucleic acid (PNA) refers to nucleic acid 
analogs in that the ribose-phosphate backbone is replaced by a backbone 
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held together by amide bonds. 

As used herein, proteome means all the proteins present within a 

cell. 

As used herein, a biomolecule is any compound found in nature, or 
5 derivatives thereof. Biomolecules include, but are not limited to 

oligonucleotides, oligonucleosides, proteins, peptides, amino acids, lipids, 
steroids, peptide nucleic acids (PNAs), oligosaccharides and 
monosaccharides. 

As used herein, MALDI-TOF refers to matrix assisted laser 

10 desorption ionization-time of flight mass spectrometry. 

As used herein, the term "conditioned" or "conditioning," when 
used in reference to a protein thereof, means that the polypeptide is 
modified to decrease the laser energy required to volatilize the protein, to 
minimize the likelihood of fragmentation of the protein, or to increase the 

15 resolution of a mass spectrum of the protein or of the component amino 
acids. Resolution of a mass spectrum of a protein can be increased by 
conditioning the protein prior to performing mass spectrometry. 
Conditioning can be performed at any stage prior to mass spectrometry 
and, in one embodiment, is performed while the protein is immobilized. A 

20 protein can be conditioned, for example, by treating it with a cation 

exchange material or an anion exchange material, which can reduce the 
charge heterogeneity of the protein, thereby for eliminating peak 
broadening due to heterogeneity in the number of cations (or anions) 
bound to the various proteins a population. In one embodiment, removal 

25 of all cations by ion exchange, except for H + and ammonium ions, is 
performed. Contacting a polypeptide with an alkylating agent such as 
alkyliodide, iodoacetamide, iodoethanol, or 2,3-epoxy-1-propanol, the 
formation of disulfide bonds, for example, in a proteins can be prevented. 
Likewise, charged amino acid side chains can be converted to uncharged 
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derivatives employing trialkylsilyl chlorides. 

Since the capture compounds contain protein and nucleic acid 
portions, conditioning suitable for one or both portions is also 
contemplated. Hence, a prepurification to enrich the biomolecules to be 
5 analyzed and the removal of all cations, such as by ion exchange, except 
for H + and ammonium, or other conditioning treatment to improve 
resolution is advantageous for analysis of the nucleic acid portion as well 
as the protein portion. 

Conditioning of proteins is generally unnecessary because proteins 

10 are relatively stable under acidic, high energy conditions so that proteins 
do not require conditioning for mass spectrometric analyses. There are 
means of improving resolution, however, in one embodiment for shorter 
peptides, such as by incorporating modified amino acids that are more 
basic than the corresponding unmodified residues. Such modification in 

15 general increases the stability of the polypeptide during mass 

spectrometric analysis. Also, cation exchange chromatography, as well 
as general washing and purification procedures that remove proteins and 
other reaction mixture components away from the protein can be used to 
increase the resolution of the spectrum resulting from mass spectrometric 

20 analysis of the protein. 

As used herein, "matrix" refers to the material with which the 
capture compound biomolecule conjugates are combined for MALDI mass 
spectrometric analysis. Any matrix material, such as solid acids, 
including 3-hydroxypicolinic acid, liquid matrices, such as glycerol, known 

25 to those of skill in the art for nucleic acid and/or protein analyses is 
contemplated. Since the compound biomolecule conjugates contain 
nucleic acid and protein a mixture (optimal for nucleic acids and proteins) 
of matrix molecules can be used. 

As used herein, macromolecule refers to any molecule having a 
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molecular weight from the hundreds up to the millions. Macromolecules 
include, but are not limited to, peptides, proteins, nucleotides, nucleic 
acids, carbohydrates, and other such molecules that are generally 
synthesized by biological organisms, but can be prepared synthetically or 
5 using recombinant molecular biology methods. 

As used herein, the term "biopolymer" is refers to a biological 
molecule, including macromolecules, composed of two or more 
monomeric subunits, or derivatives thereof, which are linked by a bond or 
a macromolecule. A biopolymer can be, for example, a polynucleotide, a 

10 polypeptide, a carbohydrate, or a lipid, or derivatives or combinations 

thereof, for example, a nucleic acid molecule containing a peptide nucleic 
acid portion or a glycoprotein. The methods and collections herein, 
though described with reference to biopolymers, can be adapted for use 
with other synthetic schemes and assays, such as organic syntheses of 

15 pharmaceuticals, or inorganics and any other reaction or assay performed 
on a solid support or in a well in nanoliter or smaller volumes. 

As used herein, biomolecule includes biopolymers and 
macromolecules and all molecules that can be isolated from living 
organisms and viruses, including, but are not limited to, cells, tissues, 

20 prions, animals, plants, viruses, bacteria and other organsims. 

As used herein, a biological particle refers to a virus, such as a viral 
vector or viral capsid with or without packaged nucleic acid, phage, inclu- 
ding a phage vector or phage capsid, with or without encapsulated 
nucleotide acid, a single cell, including eukaryotic and prokaryotic cells or 

25 fragments thereof, a liposome or micellar agent or other packaging 
particle, and other such biological materials. For purposes herein, 
biological particles include molecules that are not typically considered 
macromolecules because they are not generally synthesized, but are 
derived from cells and viruses. 
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As used herein, a drug refers to any compound that is a candidate 
for use as a therapeutic or as lead compound for designing a therapeutic 
or that is a known pharmaceutical. Such compounds can be small 
molecules, including small organic molecules, peptides, peptide mimetics, 
5 antisense molecules, antibodies, fragments of antibodies, recombinant 
antibodies. Of particular interest are "drugs" that have specific binding 
properties so that they can be used as selectivity groups or can be used 
as for sorting of the capture compounds, either a sorting functionality 
that binds to a target on a support, or linked to a solid support, where the 

10 sorting functionality is the drug target. 

As used herein, the term "nucleic acid" refers to single-stranded 
and/or double-stranded polynucleotides such as deoxyribonucleic acid 
(DNA), and ribonucleic acid (RNA) as well as analogs or derivatives of 
either RNA or DNA. A nucleic acid molecules are linear polymers of 

15 nucleotides, linked by 3', 5' phosphodiester linkages. In DNA, 

deoxyribonucleic acid, the sugar group is deoxyribose and the bases of 
the nucleotides are adenine, guanine, thymine and cytosine. RNA, 
ribonucleic acid, has ribose as the sugar and uracil replaces thymine. 
Also included in the term "nucleic acid" are analogs of nucleic acids such 

20 as peptide nucleic acid (PNA), phosphorothioate DNA, and other such 
analogs and derivatives or combinations thereof. 

As used herein, the term "polynucleotide" refers to an oligomer or 
polymer containing at least two linked nucleotides or nucleotide 
derivatives, including a deoxyribonucleic acid (DNA), a ribonucleic acid 

25 (RNA), and a DNA or RNA derivative containing, for example, a 

nucleotide analog or a "backbone" bond other than a phosphodiester 
bond, for example, a phosphotriester bond, a phosphoramidate bond, a 
methylphosphonate diester bond, a phophorothioate bond, a thioester 
bond, or a peptide bond (peptide nucleic acid). The term 
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"oligonucleotide" also is used herein essentially synonymously with 
"polynucleotide," although those in the art recognize that 
oligonucleotides, for example, PCR primers, generally are less than about 
fifty to one hundred nucleotides in length. 
5 Nucleotide analogs contained in a polynucleotide can be, for 

example, mass modified nucleotides, which allows for mass 
differentiation of polynucleotides; nucleotides containing a detectable 
label such as a fluorescent, radioactive, colorometric, luminescent or 
chemiluminescent label, which allows for detection of a polynucleotide; or 

10 nucleotides containing a reactive group such as biotin or a thiol group, 
which facilitates immobilization of a polynucleotide to a solid support. A 
polynucleotide also can contain one or more backbone bonds that are 
selectively cleavable, for example, chemically, enzymatically or 
photolytically. For example, a polynucleotide can include one or more 

15 deoxyribonucleotides, followed by one or more ribonucleotides, which 
can be followed by one or more deoxyribonucleotides, such a sequence 
being cleavable at the ribonucleotide sequence by base hydrolysis. A 
polynucleotide also can contain one or more bonds that are relatively 
resistant to cleavage, for example, a chimeric oligonucleotide primer, 

20 which can include nucleotides linked by peptide nucleic acid bonds and at 
least one nucleotide at the 3' end, which is linked by a phosphodiester 
bond, or the like, and is capable of being extended by a polymerase. 
Peptide nucleic acid sequences can be prepared using well known 
methods (see, for example, Weiler et al. (1 997) Nucleic acids Res. 

25 25:2792-2799). 

A polynucleotide can be a portion of a larger nucleic acid molecule, 
for example, a portion of a gene, which can contain a polymorphic region, 
or a portion of an extragenic region of a chromosome, for example, a 
portion of a region of nucleotide repeats such as a short tandem repeat 
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(STR) locus, a variable number of tandem repeats (VNTR) locus, a 
microsatellite locus or a minisatellite locus. A polynucleotide also can be 
single stranded or double stranded, including, for example, a DNA-RNA 
hybrid, or can be triple stranded or four stranded. Where the 
5 polynucleotide is double stranded DNA, it can be in an A, B, L or Z 
configuration, and a single polynucleotide can contain combinations of 
such configurations. 

As used herein, a "mass modification" with respect to a 
biomolecule to be analyzed for mass spectrometry, refers to the inclusion 

10 of changes in consituent atoms or groups that change the molecule 
weight of the resulting molecule in defined increments detectable by 
mass spectrometic analysis. Mass modifications do not radiolabels, such 
as isotope labels or or fluroescent gropus or other such tags normally 
used for detection by means other than mass spectrometry. 

15 As used herein, the term "polypeptide," means at least two amino 

acids, or amino acid derivatives, including mass modified amino acids and 
amino acid analogs, which are linked by a peptide bond and which can be 
a modified peptide bond. A polypeptide can be translated from a poly- 
nucleotide, which can include at least a portion of a coding sequence, or 

20 a portion of a nucleotide sequence that is not naturally translated due, for 
example, to it being located in a reading frame other than a coding frame, 
or it being an intron sequence, a 3 # or 5' untranslated sequence, a 
regulatory sequence such as a promoter. A polypeptide also can be 
chemically synthesized and can be modified by chemical or enzymatic 

25 methods following translation or chemical synthesis. The terms 

"polypeptide," "peptide" and "protein" are used essentially synonymously 
herein, although the skilled artisan recognizes that peptides generally 
contain fewer than about fifty to one hundred amino acid residues, and 
that proteins often are obtained from a natural source and can contain, 
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for example, post-translational modifications. A polypeptide can be 
post-translationally modified by, for example, phosphorylation 
(phosphoproteins), glycosylation (glycoproteins, proteoglycans), which 
can be performed in a cell or in a reaction in vitro, 
5 As used herein, the term "conjugated" refers stable attachment, 

typically by virtue of a chemical interaction, including ionic and/or 
covalent attachment. Among the conjugation means are streptavidin- or 
avidin- to biotin interaction; hydrophobic interaction; magnetic interaction 
(e.g., using functionalized magnetic beads, such as DYNABEADS, which 

10 are streptavidin-coated magnetic beads sold by Dynal, Inc. Great Neck, 
NY and Oslo Norway); polar interactions, such as "wetting" associations 
between two polar surfaces or between oligo/polyethylene glycol; 
formation of a covalent bond, such as an amide bond, disulfide bond, 
thioether bond, or via crosslinking agents; and via an acid-labile or 

15 photocleavable linker. 

As used herein, "sample" refers to a composition containing a 
material to be detected. For purposes sample refers to anything which 
can contain an biomolecule. The sample can be a biological sample, such 
as a biological fluid or a biological tissue obtained from any organism or a 

20 cell of or from an organism or a viral particle or portions thereof. 

Examples of biological fluids include urine, blood, plasma, serum, saliva, 
semen, stool, sputum, cerebral spinal fluid, tears, mucus, sperm, amniotic 
fluid or the like. Biological tissues are aggregate of cells, usually of a 
particular kind together with their intercellular substance that form one of 

25 the structural materials of a human, animal, plant, bacterial, fungal or viral 
structure, including connective, epithelium, muscle and nerve tissues. 
Examples of biological tissues also include organs, tumors, lymph nodes, 
arteries and individual cell(s). 

Thus, samples include biological samples (e.g., any material 
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obtained from a source originating from a living being {e.g., human, 
animal, plant, bacteria, fungi, protist, virus). The biological sample can be 
in any form, including solid materials {e.g., tissue, cell pellets and 
biopsies, tissues from cadavers) and biological fluids {e.g., urine, blood, 
5 saliva, amniotic fluid and mouth wash (containing buccal cells)). In 
certain embodiments, solid materials are mixed with a fluid. In 
embodiments herein, the a sample for mass spectrometric analysis 
includes samples that contain a mixture of matrix used for mass 
spectrometric analyses and the capture compound/biomolecule 

10 complexss. 

As used herein, the term "solid support" means a non-gaseous, 
non-liquid material having a surface. Thus, a solid support can be a flat 
surface constructed, for example, of glass, silicon, metal, plastic or a 
composite; or can be in the form of a bead such as a silica gel, a 

15 controlled pore glass, a magnetic or cellulose bead; or can be a pin, 

including an array of pins suitable for combinatorial synthesis or analysis. 

As used herein, a collection refers to combination of two or more 
members, generally 3, 5, 10, 50, 100, 500, 1000 or more members. In 
particular a collection refers to such combination of the capture 

20 compounds as provided herein. 

As used herein, an array refers to a collection of elements, such as 
the capture compounds, containing three or more members. An 
addressable array is one in that the members of the array are identifiable, 
typically by position on a solid phase support but also by virtue of an 

25 identifier or detectable label. Hence, in general the members of an array 
are be immobilized to discrete identifiable loci on the surface of a solid 
phase. A plurality of of the compounds are attached to a support, such 
as an array {i.e., a pattern of two or more) on the surface of a support, 
such as a silicon chip or other surface, generally through binding of the 
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sorting functionality with a group or compound on the surface of the 
support. Addressing can be achieved by labeling each each member 
electronically, such as with an radio-frequency (RF) tag, through the use 
of color coded beads or other such identifiable and color coded labels and 
5 through molecular weight. These labels for addressing serve as sorting 
functions "Q." Hence, in general the members of the array are 
immobilized to discrete identifiable loci on the surface of a solid phase or 
directly or indirectly linked to or otherwise associated with the identifiable 
label, such as affixed to a microsphere or other particulate support (herein 
10 referred to as beads) and suspended in solution or spread out on a 
surface. 

As used herein, "substrate" refers to an insoluble support onto 
that a sample and/or matrix is deposited. Support can be fabricated 
from virtually any insoluble or solid material. For example, silica gel, 

15 glass (e.g., controlled-pore glass (CPG)), nylon, Wang resin, Merrifield 
resin, dextran cross — linked with epichlorohydrin (e.g., Sephadex R ), 
agarose (e.g., Sepharose R ), cellulose, magnetic beads, Dynabeads, a 
metal surface (e.g., steel, gold, silver, aluminum, silicon and copper), a 
plastic material (e.g., polyethylene, polypropylene, polyamide, polyester, 

20 polyvinylidenedifluoride (PVDF)) Exemplary substrate include, but are not 
limited to, beads (e.g., silica gel, controlled pore glass, magnetic, dextran 
cross— linked with epichlorohydrin (e.g., Sephadex R ), agarose (e.g., 
Sepharose R ), cellulose), capillaries, flat supports such as glass fiber filters, 
glass surfaces, metal surfaces (steel, gold, silver, aluminum, copper and 

25 silicon), plastic materials including multiwell plates or membranes (e.g., of 
polyethylene, polypropylene, polyamide, polyvinylidenedifluoride), pins, 
(e.g., arrays of pins suitable for combinatorial synthesis or analysis or 
beads in pits of flat surfaces such as wafers (e.g., silicon wafers) with or 
without filter plates. The solid support is in any desired form, including. 
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but not limited to, a bead, capillary, plate, membrane, wafer, comb, pin, a 
wafer with pits, an array of pits or nanoliter wells and other geometries 
and forms known to those of skill in the art. Supports include flat 
surfaces designed to receive or link samples at discrete loci. In one 
embodiment, flat surfaces include those with hydrophobic regions 
surrounding hydrophilic loci for receiving, containing or binding a sample. 

The supports can be particulate or can be in the form of a 
continuous surface, such as a microtiter dish or well, a glass slide, a 
silicon chip, a nitrocellulose sheet, nylon mesh, or other such materials. 
When particulate, typically the particles have at least one dimension in 
the 5-10 mm range or smaller. Such particles, referred collectively herein 
as "beads", are often, but not necessarily, spherical. Reference to 
"bead," however, does not constrain the geometry of the matrix, which 
can be any shape, including random shapes, needles, fibers, and 
elongated. "Beads", particularly microspheres that are sufficiently small 
to be used in the liquid phase, are also contemplated. The "beads" can 
include additional components, such as magnetic or paramagnetic 
particles (see, e.g.,, Dyna beads (Dynal, Oslo, Norway)) for separation 
using magnets, as long as the additional components do not interfere 
with the methods and analyses herein. 

As used herein, "polymorphism" refers to the coexistence of more 
than one form of a gene or portion thereof. A portion of a gene of that 
there are at least two different forms, e.g., two different nucleotide 
sequences, is referred to as a "polymorphic region of a gene". A 
polymorphic region can be a single nucleotide, e.g., a single nucleotide 
polymorphism (SNP), the identity of that differs in different alleles. A 
polymorphic region also can be several nucleotides in length. 

As used herein, "polymorphic gene" refers to a gene having at 
least one polymorphic region. 
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As used herein, "allele", which is used interchangeably herein with 
"allelic variant" refers to alternative forms of a gene or portions thereof. 
Alleles occupy the same locus or position on homologous chromosomes. 
When a subject has two identical alleles of a gene, the subject is said to 
5 be homozygous for the gene or allele. When a subject has two different 
alleles of a gene, the subject is said to be heterozygous for the gene. 
Alleles of a specific gene can differ from each other in a single nucleotide, 
or several nucleotides, and can include substitutions, deletions, and 
insertions of nucleotides. An allele of a gene also can be a form of a gene 
10 containing a mutation. 

As used herein, "predominant allele" refers to an allele that is 
represented in the greatest frequency for a given population. The allele or 
alleles that are present in lesser frequency are referred to as allelic 
variants. 

15 As used herein, "associated" refers to coincidence with the 

development or manifestation of a disease, condition or phenotype. 
Association can be due to, but is not limited to, genes responsible for 
housekeeping functions whose alteration can provide the foundation for a 
variety of diseases and conditions, those that are part of a pathway that 

20 is involved in a specific disease, condition or phenotype and those that 
indirectly contribute to the manifestation of a disease, condition or 
phenotype. 1 

As used herein, the term "subject" refers to a living organism, such 
as a mammal, a plant, a fungi, an invertebrate, a fish, an insect, a 

25 pathogenic organism, such as a virus or a bacterium, and, includes 
humans and other mammals. 

As used herein, the term "gene" or "recombinant gene" refers to a 
nucleic acid molecule containing an open reading frame and including at 
least one exon and (optionally) an intron sequence. A gene can be either 
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RNA or DNA. Genes can include regions preceding and following the 
coding region. 

As used herein, "intron" refers to a DNA fragment present in a 
given gene that is spliced out during mRNA maturation. 
5 As used herein, "nucleotide sequence complementary to the 

nucleotide sequence set forth in SEQ ID NO: x" refers to the nucleotide 
sequence of the complementary strand of a nucleic acid strand having 
SEQ ID NO: x. The term "complementary strand" is used herein 
interchangeably with the term "complement". The complement of a 

10 nucleic acid strand can be the complement of a coding strand or the 

complement of a non-coding strand. When referring to double stranded 
nucleic acids, the complement of a nucleic acid having SEQ ID NO: x 
refers to the complementary strand of the strand having SEQ ID NO: x or 
to any nucleic acid having the nucleotide sequence of the complementary 

15 strand of SEQ ID NO: x. When referring to a single stranded nucleic acid 
having the nucleotide sequence SEQ ID NO: x, the complement of this 
nucleic acid is a nucleic acid having a nucleotide sequence that is 
complementary to that of SEQ ID NO: x. 

As used herein, the term "coding sequence" refers to that portion 

20 of a gene that encodes a amino acids that constitute a polypeptide or 
protein. 

As used herein, the term "sense strand" refers to that strand of a 
double-stranded nucleic acid molecule that has the sequence of the 
mRNA that encodes the amino acid sequence encoded by the double- 
25 stranded nucleic acid molecule. 

As used herein, the term "antisense strand" refers to that strand of 
a double-stranded nucleic acid molecule that is the complement of the 
sequence of the mRNA that encodes the amino acid sequence encoded 
by the double-stranded nucleic acid molecule. 



< 
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As used herein, the amino acids, which occur in the various amino 
acid sequences appearing herein, are identified according to their well- 
known, three-letter or one-letter abbreviations. The nucleotides, which 
occur in the various DNA fragments, are designated with the standard 
single-letter designations used routinely in the art {see, Table 1). 

As used herein, amino acid residue refers to an amino acid formed 
upon chemical digestion (hydrolysis) of a polypeptide at its peptide 
linkages. The amino acid residues described herein are, in certain 
embodiments, in the "L" isomeric form. Residues in the "D" isomeric 
form can be substituted for any L-amino acid residue, as long as the a 
desired functional property is retained by the polypeptide. NH 2 refers to 
the free amino group present at the amino terminus of a polypeptide. 
COOH refers to the free carboxy group present at the carboxyl terminus 
of a polypeptide. In keeping with standard polypeptide nomenclature 
described in J. Biol. Chem., 243:3552-59 (1969) and adopted at 37 
C.F.R. § § 1 .821 - 1 .822, abbreviations for amino acid residues are 
shown in the following Table: 



Table 1 

Table of Correspondence 



SYMBOL 




1 -Letter 


3-Letter 


AMINO ACID 


Y 


Tyr 


tyrosine 


G 


Gly 


glycine 


F 


Phe 


phenylalanine 


M 


Met 


methionine 


A 


Ala 


alanine 


S 


Ser 


serine 


I 


lie 


isoleucine 


L 


Leu 


leucine 
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10 



15 



20 



| SYMBOL 




IT 


Thr 


threonine 


V 


Val 


1 valine 


| P 


1 Pro 


1 proline 


1 K 


I Lys 


lysine 


H 


His 


histidine 


Q 


Gin 


I glutamine 


J E 


1 Glu 


glutamic acid 


[ Z 


Glx 


Glu and/or Gin 


I W 


Trp 


tryptophan 


1 R 1 


Arg 


arginine I 


Id 


Asp 


aspartic acid 


N 


Asn 


asparagine 


B 


Asx 


Asn and/or Asp 


c 


Cys 


cysteine 


1 x 1 


Xaa 


Unknown or other 



25 



It should be noted that all amino acid residue sequences 
represented herein by formulae have a left to right orientation in the 
conventional direction of amino-terminus to carboxyl-terminus. In 
addition, the phrase "amino acid residue" is broadly defined to include the 
amino acids listed in the Table of Correspondence and modified and 
unusual amino acids, such as those referred to in 37 C.F.R. § § 1.821- 
1 .822, and incorporated herein by reference. Furthermore, it should be 
noted that a dash at the beginning or end of an amino acid residue 
sequence indicates a peptide bond to a further sequence of one or more 
amino acid residues or to an amino-terminal group such as NH 2 or to a 
carboxyl-terminal group such as COOH. 
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In a peptide or protein, suitable conservative substitutions of amino 
acids are known to those of skill in this art and can be made generally 
without altering the biological activity of the resulting molecule. Those of 
skill in this art recognize that, in general, single amino acid substitutions 
5 in non-essential regions of a polypeptide do not substantially alter 

biological activity (see, e.g., Watson et al. Molecular Biology of the Gene, 
4th Edition, 1987, The Benjamin/Cummings Pub. co., p. 224). 

Such substitutions can be made in accordance with those set forth 





in TABLE 2 as follows: 




10 




TABLE 2 




Ala (A) 


Gly; Ser 




Arg <R) 


Lys 




Asn (N) 


Gin; His 


15 


Asp (D) 


Glu 




Cys (C) 


Ser 




Gin (Q) 


Asn 




Glu <E) 


Asp 




Gly (G) 


Ala; Pro 


20 


His (H) 


Asn; Gin 




lie (1) 


Leu; Val 




Leu (L> 


He; Val 




Lys (K) 


Arg; Gin 




Met (M) 


Leu; Tyr; He 


25 


Phe (F) 


Met; Leu; Tyr 




Ser (S) 


Thr 




Thr (T) 


Ser 




Trp (W) 


Tyr 




Tyr (Y) 


Trp; Phe 


30 


Val (V) 


lie; Leu 



Other substitutions are also permissible and can be determined empirically 
or in accord with known conservative substitutions. 

As used herein, a DNA or nucleic acid homolog refers to a nucleic 
acid that includes a preselected conserved nucleotide sequence, such as 
35 a sequence encoding a therapeutic polypeptide. By the term 
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"substantially homologous" is meant having at least 80%, at least 90%, 
at least 95% homology therewith or a less percentage of homology or 
identity and conserved biological activity or function. 

The terms "homology" and "identity" are often used 
5 interchangeably. In this regard, percent homology or identity can be 
determined, for example, by comparing sequence information using a 
GAP computer program. The GAP program uses the alignment method of 
Needleman and Wunsch (J. Mol. Biol. 48:443 (1970), as revised by 
Smith and Waterman {Adv. AppL Math. 2:482 (1981). Briefly, the GAP 

10 program defines similarity as the number of aligned symbols {e.g., 

nucleotides or amino acids) that are similar, divided by the total number 
of symbols in the shorter of the two sequences. The default parameters 
for the GAP program can include: (1) a unary comparison matrix 
(containing a value of 1 for identities and 0 for non-identities) and the 

15 weighted comparison matrix of Gribskov and Burgess, Nucl. Acids Res. 

14:6745 (1986), as described by Schwartz and Dayhoff, eds., ATLAS OF 
PROTEIN SEQUENCE AND STRUCTURE, National Biomedical Research 
Foundation, pp. 353-358 (1979); (2) a penalty of 3.0 for each gap and 
an additional 0.10 penalty for each symbol in each gap; and (3) no 

20 penalty for 
end gaps. 

Whether any two nucleic acid molecules have nucleotide 
sequences that are at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or 
99% "identical" can be determined using known computer algorithms 
25 such as the "FAST A" program, using for example, the default 
parameters as in Pearson and Lipman, Proc. Natl. Acad. Sci. USA 
55:2444 (1988). Alternatively the BLAST function of the National Center 
for Biotechnology Information database can be used to determine identity 
In general, sequences are aligned so that the highest order match 
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is obtained. "Identity" per se has an art-recognized meaning and can be 
calculated using published techniques. (See, e.g. : Computational 
Molecular Biology, Lesk, A.M., ed., Oxford University Press, New York, 
1988; Biocomputing: Informatics and Genome Projects, Smith, D.W., ed., 
5 Academic Press, New York, 1993; Computer Analysis of Sequence Data, 
Part I, Griffin, A.M., and Griffin, H.G., eds., Humana Press, New Jersey, 
1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic 
Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, 
J., eds., M Stockton Press, New York, 1991). While there exist a 

10 number of methods to measure identity between two polynucleotide or 
polypeptide sequences, the term "identity" is well known to skilled 
artisans (Carillo, H. & Lipton, D., SIAM J Applied Math 48:1073 (1988)). 
Methods commonly employed to determine identity or similarity between 
two sequences include, but are not limited to, those disclosed in Guide to 

15 Huge Computers, Martin J. Bishop, ed., Academic Press, San Diego, 
1994, and Carillo, H. & Lipton, D., SIAM J Applied Math 45:1073 
(1988). Methods to determine identity and similarity are codified in 
computer programs. Computer program methods to determine identity 
and similarity between two sequences include, but are not limited to, 

20 GCG program package (Devereux, J., et aL, Nucleic Acids Research 
12(I):3Q7 (1984)), BLASTP, BLASTN, FASTA (Atschul, S.F., etal., J 
Molec Biol 275:403 (1990)). 

Therefore, as used herein, the term "identity" represents a 
comparison between a test and a reference polypeptide or polynucleotide. 

25 For example, a test polypeptide can be defined as any polypeptide that is 
90% or more identical to a reference polypeptide. 

As used herein, the term at least "90% identical to" refers to 
percent identities from 90 to 99.99 relative to the reference polypeptides. 
Identity at a level of 90% or more is indicative of the fact that, assuming 
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for exemplification purposes a test and reference polypeptide length of 
100 amino acids are compared. No more than 10% (e.g., 10 out of 100) 
amino acids in the test polypeptide differs from that of the reference 
polypeptides. Similar comparisons can be made between a test and 
5 reference polynucleotides. Such differences can be represented as point 
mutations randomly distributed over the entire length of an amino acid 
sequence or they can be clustered in one or more locations of varying 
length up to the maximum allowable, e.g., 10/100 amino acid difference 
(approximately 90% identity). Differences are defined as nucleic acid or 
10 amino acid substitutions, or deletions. 

As used herein: stringency of hybridization in determining 
percentage mismatch is as follows: 

1) high stringency: 0.1 x SSPE, 0.1% SDS, 65 °C 

2) medium stringency: 0.2 x SSPE, 0.1% SDS, 50°C 
15 3) low stringency: 1.0 x SSPE, 0.1% SDS, 50°C 

Those of skill in this art know that the washing step selects for 
stable hybrids and also know the ingredients of SSPE (see, e.g., 
Sambrook, E.F. Fritsch, T. Maniatis, in: Molecular Cloning, A Laboratory 
Manual, Cold Spring Harbor Laboratory Press (1989), vol. 3, p. B.13, see, 

20 also, numerous catalogs that describe commonly used laboratory 

solutions). SSPE is pH 7.4 phosphate- buffered 0.18 NaCI. Further, 
those of skill in the art recognize that the stability of hybrids is 
determined by T m , which is a function of the sodium ion concentration 
and temperature <T m - 81.5° C-1 6.6(log 10 [Na + ]) + 0.41(%G + C)-600/D), 

25 so that the only parameters in the wash conditions critical to hybrid 
stability are sodium ion concentration in the SSPE (or SSC) and 
temperature. 

It is understood that equivalent stringencies can be achieved using 
alternative buffers, salts and temperatures. By way of example and not 



WO 03/092581 



PCT7US02/22821 



-32- 

limitation, procedures using conditions of low stringency are as follows 
(see also Shilo and Weinberg, Proc. Natl. Acad. Sci. USA 75:6789-6792 
(1981)): Filters containing DNA are pretreated for 6 hours at 40°C in a 
solution containing 35% formamide, 5X SSC, 50 mM Tris-HCI (pH 7.5), 
5 5 mM EDTA, 0.1 % PVP, 0.1 % Ficoll, 1 % BSA, and 500 //g/ml denatured 
salmon sperm DNA (10X SSC is 1.5 M sodium chloride, and 0.15 M 
sodium citrate, adjusted to a pH of 7). 

Hybridizations are carried out in the same solution with the 
following modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 
10 A/g/ml salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20 X 10 6 
cpm 32 P-labeled probe is used. Filters are incubated in hybridization 
mixture for 18-20 hours at 40°C, and then washed for 1.5 hours at 
55 °C in a solution containing 2X SSC, 25 mM Tris-HCI (pH 7.4), 5 mM 
EDTA, and 0.1% SDS. The wash solution is replaced with fresh solution 
5 and incubated an additional 1.5 hours at 60°C. Filters are blotted dry 
and exposed for autoradiography. If necessary, filters are washed for a 

» 

third time at 65-68 °C and reexposed to film. Other conditions of low 
stringency which can be used are well known in the art {e.g., as 
employed for cross-species hybridizations). 

0 By way of example and not way of limitation, procedures using 
conditions of moderate stringency include, for example, but are not 
limited to, procedures using such conditions of moderate stringency are 
as follows: Filters containing DNA are pretreated for 6 hours at 55 °C in 
a solution containing 6X SSC, 5X Denhart's solution, 0.5% SDS and 100 

1 //g/ml denatured salmon sperm DNA. Hybridizations are carried out in the 
same solution and 5-20 X 10 6 cpm 32 P-labeled probe is used. Filters are 
incubated in hybridization mixture for 18-20 hours at 55°C, and then 
washed twice for 30 minutes at 60°C in a solution containing 1X SSC 
and 0.1 % SDS. Filters are blotted dry and exposed for autoradiography. 
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Other conditions of moderate stringency which can be used are well- 
known in the art. Washing of filters is done at 37 °C for 1 hour in a 
solution containing 2X SSC, 0.1% SDS. 

By way of example and not way of limitation, procedures using 
conditions of high stringency are as follows: Prehybridization of filters 
containing DNA is carried out for 8 hours to overnight at 65 °C in buffer 
composed of 6X SSC, 50 mM Tris-HCi (pH 7.5), 1 mM EDTA, 0.02% 
PVP, 0.02% Ficoll, 0.02% BSA, and 500 //g/ml denatured salmon sperm 
DNA. Filters are hybridized for 48 hours at 65 °C in prehybridization 
mixture containing 100 //g/ml denatured salmon sperm DNA and 5-20 X 
10 6 cpm of 32 P-labeled probe. Washing of filters is done at 37 °C for 
1 hour in a solution containing 2X SSC, 0.01% PVP, 0.01% Ficoll, and 
0.01 % BSA. This is followed by a wash in 0.1X SSC at 50°C for 45 
minutes before autoradiography. Other conditions of high stringency 
which can be used are well known in the art. 

The term substantially identical or substantially homologous or 
similar varies with the context as understood by those skilled in the 
relevant art and generally means at least 60% or 70%, preferably means 
at least 80%, 85% or more preferably at least 90%, and most preferably 
at least 95% identity. 

It is to be understood that the compounds provided herein can 
contain chiral centers. Such chiral centers can be of either the (R) or (S) 
configuration, or can be a mixture thereof. Thus, the compounds 
provided herein can be enantiomerically pure, or be stereoisomeric or 
diastereomeric mixtures. In the case of amino acid residues, such 
residues can be of either the L- or D-form. In one embodiment, the 
configuration for naturally occurring amino acid residues is L. 

As used herein, substantially pure means sufficiently homogeneous 
to appear free of readily detectable impurities as determined by standard 
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methods of analysis, such as thin layer chromatography (TLC), gel 
electrophoresis, high performance liquid chromatography (HPLC) and 
mass spectrometry (MS), used by those of skill in the art to assess such 
purity, or sufficiently pure such that further purification would not 
5 detectably alter the physical and chemical properties, such as enzymatic 
and biological activities, of the substance. Methods for purification of the 
compounds to produce substantially chemically pure compounds are 
known to those of skill in the art- A substantially chemically pure 
compound can, however, be a mixture of stereoisomers. In such 
10 instances, further purification might increase the specific activity of the 
compound. 

As used herein, a cleavable bond or moiety refers to a bond or 
moiety that is cleaved or cleavable under the specific conditions, such as 
chemically, enzymatically or photolytically. Where not specified herein, 

15 such bond is cleavable under conditions of MALDI-MS analysis, such as 
by a UV or IR laser. 

As used herein, a "selectively cleavable" moiety is a moiety that 
can be selectively cleaved without affecting or altering the composition of 
the other portions of the compound of interest. For example, a cleavable 

20 moiety L of the compounds provided herein is one that can be cleaved by 
chemical, enzymatic, photolytic, or other means without affecting or 
altering composition (e.g., the chemical composition) of the conjugated 
biomolecule, including a protein. "Non-cleavable" moieties are those that 
cannot be selectively cleaved without affecting or altering the 

25 composition of the other portions of the compound of interest. 

As used herein, binding with high affinity refers to a binding that 
as an association constant k a of at least 10 9 and generally 10 10 , 10 11 
liters/mole or greater) or a K eq of 10 9 , 10 10 , 10 11 , 10 12 or greater. For 
purposes herein, high affinity bonds formed by the reactivity groups are 
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those that are stable to the laser (UV and IR) used in MALDI-MS 
analyses. 

As used herein, "alkyl", "alkenyl" and "alkynyl", if not specified, 
contain from 1 to 20 carbons, or 1 to 16 carbons, and are straight or 
5 branched carbon chains. Alkenyl carbon chains are from 2 to 20 
carbons, and, in certain embodiments, contain 1 to 8 double bonds. 
Alkenyl carbon chains of 1 to 16 carbons, in certain embodiments, 
contain 1 to 5 double bonds. Alkynyl carbon chains are from 2 to 20 
carbons, and, in one embodiment, contain 1 to 8 triple bonds. Alkynyl 

10 carbon chains of 2 to 16 carbons, in certain embodiments, contain 1 to 5 
triple bonds. Exemplary alkyl, alkenyl and alkynyl groups include, but are 
not limited to, methyl, ethyl, propyl, isopropyl, isobutyl, n-butyl, sec- 
butyl, tert-butyl, isopentyl, neopentyl, tert-penytyl and isohexyl. The 
alkyl, alkenyl and alkynyl groups, unless otherwise specified, can be 

15 optionally substituted, with one or more groups, including alkyl group 
substituents that can be the same or different. 

As used herein, "lower alkyl", "lower alkenyl", and "lower alkynyl" 
refer to carbon chains having less than about 6 carbons. 

As used herein, "alk(en)(yn)yl n refers to an alkyl group containing 

20 at least one double bond and at least one triple bond. 

As used herein, an "alkyl group substituent" includes, but is not 
limited to, halo, haloalkyl, including halo lower alkyl, aryl, hydroxy, 
alkoxy, aryloxy, alkyloxy, alkylthio, arylthio, aralkyloxy, aralkylthio, 
carboxy alkoxycarbonyl, oxo and cycloalkyl. 

25 As used herein, "aryl" refers to aromatic groups containing from 5 

to 20 carbon atoms and can be a mono-, multicyclic or fused ring 
system. Aryl groups include, but are not limited to, phenyl, naphthyl, 
biphenyl, fluorenyl and others that can be unsubstituted or are 
substituted with one or more substituents. 
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As used herein, "aryl" also refers to aryl-containing groups, 
including, but not limited to, aryloxy, arylthio, arylcarbonyl and arylamino 
groups. 

As used herein, an "aryl group substituent" includes, but is not 
5 limited to, alkyl, alkenyl, alkynyl, cycloalkyl, cycloalkylalkyl, aryl, 
heteroaryl optionally substituted with 1 or more, including 1 to 3, 
substituents selected from halo, halo alkyl and alkyl, aralkyl, 
heteroaralkyl, alkenyl containing 1 to 2 double bonds, alkynyl containing 
1 to 2 triple bonds, alk(en)(yn)yl groups, halo, pseudohalo, cyano, 

10 hydroxy, haloalkyl and polyhaloalkyl, including halo lower alkyl, especially 
trifluoromethyl, formyl, alkylcarbonyl, arylcarbonyl that is optionally 
substituted with 1 or more, including 1 to 3, substituents selected from 
halo, halo alkyl and alkyl, heteroarylcarbonyl, carboxy, alkoxycarbonyl, 
aryloxycarbonyl, aminocarbonyl, alkylaminocarbonyl, dialkylaminocar- 

15 bonyl, arylaminocarbonyl, diarylaminocarbonyl, aralkylaminocarbonyl, 
alkoxy, aryloxy, perfluoroalkoxy, alkenyloxy, alkynyloxy, arylalkoxy, 
aminoalkyl, alkylaminoalkyl, dialkylaminoalkyl, arylaminoalkyl, amino, 
alkylamino, dialkylamino, arylamino, alkylarylamino, alkylcarbonylamino, 
arylcarbonylamino, azido, nitro, mercapto, alkylthio, arylthio, 

20 perfluoroalkylthio, thiocyano, isothiocyano, alkylsulfinyl, alkylsulfonyl, 
arylsulfinyl, arylsulfonyl, aminosulfonyl, alkylaminosulfonyl, 
dialkylaminosulfonyl and arylaminosulfonyl. 

As used herein, "aralkyl" refers to an alkyl group in that one of the 
hydrogen atoms of the alkyl is replaced by an aryl group. 

25 As used herein, "heteroaralkyl" refers to an alkyl group in that one 

of the hydrogen atoms of the alkyl is replaced by a heteroaryl group. 

As used herein, "cycloalkyl" refers to a saturated mono- or multi- 
cyclic ring system, in one embodiment, of 3 to 10 carbon atoms, or 3 to 
6 carbon atoms; cycloalkenyl and cycloalkynyl refer to mono- or 
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multicyclic ring systems that respectively include at least one double 
bond and at least one triple bond. Cycloalkenyl and cycloalkynyl groups 
can contain, in one embodiment, 3 to 10 carbon atoms, with cycloalkenyl 
groups, in other embodiments, containing 4 to 7 carbon atoms and 
5 cycloalkynyl groups, in other embodiments, containing 8 to 10 carbon 
atoms. The ring systems of the cycloalkyl, cycloalkenyl and cycloalkynyl 
groups can be composed of one ring or two or more rings that can be 
joined together in a fused, bridged or spiro-connected fashion, and can be 
optionally substituted with one or more alkyl group substituents. 

10 "Cycloalk{en)(yn)yr refers to a cycloalkyl group containing at least one 
double bond and at least one triple bond. 

As used herein, "heteroaryl" refers to a monocyclic or multicyclic 
ring system, in one embodiment of about 5 to about 15 members where 
one or more, or 1 to 3, of the atoms in the ring system is a heteroatom, 

15 which is, an element other than carbon, for example, nitrogen, oxygen 
and sulfur atoms. The heteroaryl can be optionally substituted with one 
or more, including 1 to 3, aryl group substituents. The heteroaryl group 
can be optionally fused to a benzene ring. Exemplary heteroaryl groups 
include, but are not limited to, pyrroles, porphyrines, furans, thiophenes, 

20 selenophenes, pyrazoles, imidazoles, triazoles, tetrazoles, oxazoles, 
oxadiazoles, thiazoles, thiadiazoles, indoles, carbazoles, benzofurans, 
benzothiophenes, indazoles, benzimidazoles, benzotriazoles, 
benzoxatriazoles, benzothiazoles, benzoselenozoles, benzothiadiazoles, 
benzoselenadiazoles, purines, pyridines, pyridazines, pyrimidines, 

25 pyrazines, pyrazines, triazines, quinolines, acridines, isoquinolines, 
cinnolines, phthalazines, quinazolines, quinoxalines, phenazines, 
phenanthrolines, imidazinyl, pyrrolidinyl, pyrimidinyl, tetrazolyl, thienyl, 
pyridyl, pyrrolyl, N-methylpyrrolyl, quinolinyl and isoquinolinyl. 

As used herein, "heteroaryl" also refers to heteroaryl-containing 
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groups, including, but not limited to, heteroaryloxy, heteroarylthio, 
heteroarylcarbonyl and heteroarylamino. 

As used herein, "heterocyclic" refers to a monocyclic or multicyclic 
ring system, in one embodiment of 3 to 10 members, in another 
5 embodiment 4 to 7 members, including 5 to 6 members, where one or 
more, including 1 to 3 of the atoms in the ring system is a heteroatom, 
which is, an element other than carbon, for example, nitrogen, oxygen 
and sulfur atoms. The heterocycle can be optionally substituted with one 
or more, or 1 to 3 aryl group substituents. In certain embodiments, 

10 substituents of the heterocyclic group include hydroxy, amino, alkoxy 

containing 1 to 4 carbon atoms, halo lower alkyl, including trihalomethyl, 
such as trifluoromethyl, and halogen. As used herein, the term 
heterocycle can include reference to heteroaryl. 

As used herein, the nomenclature alkyl, alkoxy, carbonyl, etc., are 

15 used as is generally understood by those of skill in this art. For example, 
as used herein alkyl refers to saturated carbon chains that contain one or 
more carbons; the chains can be straight or branched or include cyclic 
portions or be cyclic. 

Where the number of any given substituent is not specified (e.g., 

20 "haloalkyl"), there can be one or more substituents present. For example, 
"haloalkyl" can include one or more of the same or different halogens. 
As another example, "C^alkoxyphenyl" can include one or more of the 
same or different alkoxy groups containing one, two or three carbons. 
Where named substituents such as carboxy or substituents 

25 represented by variables such as W are separately enclosed in 

parentheses, yet possess no subscript outside the parentheses indicating 
numerical value and that follow substituents not in parentheses, e.g., "C^ 
4 aIkyI(W)(carboxy)", "W" and "carboxy" are each directly attached to 
4 alkyl. 
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As used herein, "halogen" or "halide" refers to F, CI, Br or I. 

As used herein, pseudohalides are compounds that behave 
substantially similar to halides. Such compounds can be used in the 
same manner and treated in the same manner as halides (X , in that X is a 
5 halogen, such as C! or Br). Pseudohalides include, but are not limited to, 
cyanide, cyanate, isocyanate, thiocyanate, isothiocyanate, selenocyanate, 
trifluoromethoxy, and azide. 

As used herein, "haloalkyl" refers to a lower alkyl radical in that 
one or more of the hydrogen atoms are replaced by halogen including, but 
10 not limited to, chloromethyl, trifluoromethyl, 1 -chloro-2-fIuoroethyl and 
the like. 

As used herein, "haloalkoxy" refers to RO- in that R is a haloalkyl 

group. 

As used herein, "sulfinyl" or "thionyl" refers to -S(OK As used 
15 herein, "sulfonyl" or "sulfuryl" refers to -S(0) 2 -. As used herein, "sulfo" 
refers to -S(0) 2 0-. 

As used herein, "carboxy" refers to a divalent radical, -C(0)0-. 

As used herein, "aminocarbonyl" refers to -C(0)NH 2 . 

As used herein, "alkylaminocarbonyl" refers to -C(0)NHR in that R 
20 is hydrogen or alkyl, including lower alkyL 

As used herein "dialkylamihocarbonyl" as used herein refers to 
-C(0)NR'R in that R' and R are independently selected from hydrogen or 
alkyl, including lower alkyl. 

As used herein, "carboxamide" refers to groups of formula 

25 -NR'COR. 

As used herein, "diarylaminocarbonyl" refers to -C(0)NRR' in that R 
and R' are independently selected from aryl, including lower aryl, such as 
phenyl. 

As used herein, "aralkylaminocarbonyl" refers to -C(0)NRR' in that 
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one of R and R' is aryl, including lower aryl, such as phenyl, and the 
other of R and R' is alkyl, including lower alkyl. 

As used herein, "arylaminocarbonyl" refers to -C(0)NHR in that R is 
aryl, including lower aryl, such as phenyl. 
5 As used herein, "alkoxycarbonyl" refers to -C(0)OR in that R is 

alkyl, including lower alkyl. 

As used herein, "aryloxycarbonyl" refers to -C(0)OR in that R is 

aryl, including lower aryl, such as phenyl. 

As used herein, "alkoxy" and "alkylthio" refer to RO- and RS-, in 

10 that R is alkyl, including lower alkyl. 

As used herein, "aryloxy" and "arylthio" refer to RO- and RS-, in 
that R is aryl, including lower aryl, such as phenyl. 

As used herein, "alkylene" refers to a straight, branched or cyclic, 
in one embodiment straight or branched, divalent aliphatic hydrocarbon 

15 group, in certain embodiments having from 1 to about 20 carbon atoms, 
in other embodiments 1 to 12 carbons, including lower alkylene. The 
alkylene group is optionally substituted with one or more "alkyl group 
substituents." There can be optionally inserted along the alkylene group 
one or more oxygen, sulphur or substituted or unsubstituted nitrogen 

20 atoms, where the nitrogen substituent is alkyl as previously described. 
Exemplary alkylene groups include methylene (-CH 2 -), ethylene 
(-CH 2 CH 2 -), propylene (— (CH 2 ) 3 -), cyclohexylene {-C 6 H 10 -), 
methylenedioxy (-0-CH 2 -0-) and ethylenedioxy (-0-{CH 2 ) 2 -0-). The term 
"lower alkylene" refers to alkylene groups having 1 to 6 carbons. In 

25 certain embodiments, alkylene groups are lower alkylene, including 
alkylene of 1 to 3 carbon atoms. 

As used herein, "alkenylene" refers to a straight, branched or 
cyclic, in one embodiment straight or branched, divalent aliphatic 
hydrocarbon group, in certain embodiments having from 2 to about 20 
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carbon atoms and at least one double bond, in other embodiments 1 to 
12 carbons, including lower alkenylene. The alkenylene group is 
optionally substituted with one or more "alkyl group substituents." There 
can be optionally inserted along the alkenylene group one or more 
5 oxygen, sulphur or substituted or unsubstituted nitrogen atoms, where 
the nitrogen substituent is alkyl as previously described. Exemplary 
alkenylene groups include — CH = CH — CH = CH— and -CH = CH-CH 2 -. 
The term "lower alkenylene" refers to alkenylene groups having 2 to 6 
carbons. In certain embodiments, alkenylene groups are lower 

10 alkenylene, including alkenylene of 3 to 4 carbon atoms. 

As used herein, "alkynylene" refers to a straight, branched or 
cyclic, in one embodiment straight or branched, divalent aliphatic 
hydrocarbon group, in certain embodiments having from 2 to about 20 
carbon atoms and at least one triple bond, in other embodiments 1 to 12 

15 carbons, including lower alkynylene. The alkynylene group is optionally 
substituted with one or more "alkyl group substituents." There can be' 
optionally inserted along the alkynylene group one or more oxygen, 
sulphur or substituted or unsubstituted nitrogen atoms, where the 
nitrogen substituent is alkyl as previously described. Exemplary 

20 alkynylene groups include — C=C— C=C-, -C = C- and -C = C-CH 2 -. The 
term "lower alkynylene" refers to alkynylene groups having 2 to 6 
carbons. In certain embodiments, alkynylene groups are lower 
alkynylene, including alkynylene of 3 to 4 carbon atoms. 

As used herein, "alk(en)(yn)ylene" refers to a straight, branched or 

25 cyclic, in one embodiment straight or branched, divalent aliphatic 

hydrocarbon group, in certain embodiments having from 2 to about 20 
carbon atoms and at least one triple bond, and at least one double bond; 
in other embodiments 1 to 12 carbons, including lower alk(en){yn)ylene. 
The alk(en)(yn)ylene group is optionally substituted with one or more 
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"alkyl group substituents." There can be optionally inserted along the 
alkynylene group one or more oxygen, sulphur or substituted or 
unsubstituted nitrogen atoms, where the nitrogen substituent is alkyl as 
previously described. Exemplary alk(en)(yn)ylene groups include 
5 — C = C— (CH 2 ) n -C = C— , where n is 1 or 2. The term "lower 
alk(en)(yn)y!ene" refers to alk(en)(yn)ylene groups having up to 6 
carbons. In certain embodiments, alk(en)(yn)ylene groups are lower 
alk(en)(yn)ylene, including alk(en)(yn)ylene of 4 carbon atoms. 

As used herein, "arylene" refers to a monocyclic or polycyclic, in 

10 one embodiment monocyclic, divalent aromatic group, in certain 

embodiments having from 5 to about 20 carbon atoms and at least one 
aromatic ring, in other embodiments 5 to 1 2 carbons, including lower 
arylene. The arylene group is optionally substituted with one or more 
"alkyl group substituents." There can be optionally inserted around the 

15 arylene group one or more oxygen, sulphur or substituted or 

unsubstituted nitrogen atoms, where the nitrogen substituent is alkyl as 
previously described. Exemplary arylene groups include 1,2-, 1,3- and 
1,4-phenylene. The term "lower arylene" refers to arylene groups having 
5 or 6 carbons. In certain embodiments, arylene groups are lower 

20 arylene. 

As used herein, "heteroarylene" refers to a divalent monocyclic or 
multicyclic ring system, in one embodiment of about 5 to about 1 5 
members where one or more, or 1 to 3 of the atoms in the ring system is 
a heteroatom, which is, an element other than carbon, for example, 
25 nitrogen, oxygen and sulfur atoms. The heteroarylene group can be 
optionally substituted with one or more, or 1 to 3, aryl group 
substituents. 

As used herein, "alkylidene" refers to a divalent group, such as 
= CR'R", which is attached to one atom of another group, forming a 
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double bond. Exemplary alkylidene groups are methylidene ( = CH 2 ) and 

ethylidene ( = CHCH 3 ). As used herein, "aralkylidene" refers to an 

alkylidene group in that either R' or R" is and aryl group. 

As used herein, "amido" refers to the divalent group -C(0)NH-. 
5 "Thioamido" refers to the divalent group -C(S)NH-. "Oxyamido" refers to 

the divalent group -OC(0)NH-. "Thiaamido" refers to the divalent group 

-SC(0)NH-. "Dithiaamido" refers to the divalent group -SC(S)NH-. 

"Ureido" refers to the divalent group -HNC(0)NH-. "Thioureido" refers to 

the divalent group -HNC{S)NH-. 
10 As used herein, "semicarbazide" refers to -NHC(0)NHNH-. 

"Carbazate" refers to the divalent group -OC(0)NHNH-. 

"Isothiocarbazate" refers to the divalent group -SC(0)NHNH-. 

"Thiocarbazate" refers to the divalent group -OC(S)NHNH-. 

"Sulfonylhydrazide" refers to the group -S0 2 NHNH-. "Hydrazide" refers 
15 to the divalent group -C(0)NHNH-. "Azo" refers to the divalent group 

-N = N-. "Hydrazinyl" refers to the divalent group -NH-NH-. 

As used herein, the term "amino acid" refers to a-amino acids that 

are racemic, or of either the D- or L-configuration. The designation n d" 

preceding an amino acid designation {e.g., dAla, dSer, dVal, etc.) refers 
20 to the D-isomer of the amino acid. The designation "dl" preceding an 

amino acid designation (e.g., dIAIa) refers to a mixture of the L- and D- 

isomers of the amino acid. 

As used herein, when any particular group, such as phenyl or 

pyridyl, is specified, this means that the group is unsubstituted or is 
25 substituted. Substituents where not specified are halo, halo lower alkyl, 

and lower alkyl. 

As used herein, conformational^ altered protein disease (or a 
disease of protein aggregation) refers to diseases associated with a 
protein or polypeptide that has a disease-associated conformation. The 
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methods and collections provided herein permit detection of a conformer 
associated with a disease to be detected. Diseases and associated 
proteins that exhibit two or more different conformations in which at 
least one conformation is a conformationally altered protein, include, but 
5 are not limited to amyloid diseases and other neurodegenerative diseases 
known to those of skill in the art and set forth below. 

As used herein, cell sorting refers to an assay in which cells are 
separated and recovered from suspension based upon properties 
measured in flow cytometry analysis. Most assays used for analysis can 

10 serve as the basis for sorting experiments, as long as gates and regions 
defining the subpopuiation(s) to be sorted do not logically overlap. 
Maximum throughput rates are typically 5000 cells/second (18 x 106 
cells/hour). The rate of collection of the separated population (s) depends 
primarily upon the condition of the cells and the percentage of reactivity. 

15 As used herein, the abbreviations for any protective groups, amino 

acids and other compounds, are, unless indicated otherwise, in accord 
with their common usage, recognized abbreviations, or the IUPAC-IUB 
Commission on Biochemical Nomenclature (see, Biochem. 1972, 
/ 7:942). For example, DMF = A/,/V-dimethylformamide, DMAc = N,N-d\- 

20 methylacetamide; THF = tetrahydrofuran; TRIS = tris(hydroxymethyl)- 
aminomethane; SSPE = saline-sodium phosphate-EDTA buffer; EDTA = 
ethylenediaminetetraacetic acid; SDS = sodium dodecyl sulfate. 
B. Collections of capture compounds 

Collections of capture compounds that selectively bind to 

25 biomolecules in samples, such as biomoelcules, particularly, although not 
exclusively, a cell lysate or in vitro translated polypeptides from a cell 
lysate are provided. Each caputure compound in the collection can bind 
to specific groups or classes of biolopolymers, and is designed to 
covalently or tightly (sufficient to sustain mass spectrmetric analysis, for 
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example) to a subset of all of the biomolecules in the sample. For 
example, a sample can contain 1000's of members, for example a cell 
lysate. The collections of compounds permit sufficient selectivity so 
that, for example, about 10-20 of the components of the sample bind to 
5 each member of the collection. The exact number is a small enough 
number for routine analyses to identify them, generally in one step, such 

as by mass spectrometry. 

The collections permit a top down holistic approach to analysis of 
the proteome, including post-translationally modified proteins, and other 

10 biomolecules. Protein and other biomolecule patterns are the starting 

point for analyses that use these collections; rather than nucleic acids and 
the genome (bottom up). The collections can be used to assess the 
biomolecule components of a sample, such as a biological sample, to 
identify components specific to a particular phenotype, such as a disease 

15 state, to identify structural function, biochemical pathways and 

mechanisms of action. The collections and methods of use permit an 
unbiased analysis of biomolecules, since the methods do not necessarily, 
assess specific classes of targets, instead, changes in samples are 
detected or identified. The collections permit the components of a 

20 complex mixture of biomolecules (i.e., a mixture of 50, 100, 500, 1000, 
2000 and more) to be sorted into discrete loci containing reduced 
numbers, typically by 10%, 50% or greater reduction in complexity, or to 
about 1 to 50 different biomolecules per locus in an array, so that the 
components at each spot can be analyzed, such as by mass 

25 spectrometric analysis alone or in combination with other analyses. In 

some embodiments, such as for phenotypic analyses, homogeneity of the 
starting sample, such as cells, can be important. To provide 
homogeneity, cells, with different phenotypes, such as diseased versus 
healthy, from the same individual are compared. Methods for doing so 
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are provided herein. 

By virtue of the structure of compounds in the collections, the 
collections can be used to detect structural changes, such as those from 
the post-translational processing of proteins, and can be used to detect 
changes in membrane proteins, which are involved in the most 
fundamental processes, such as, signal transduction, ion channels, 
recetpros for ligand interaction and cell-to-cell interactions. When cells 
become diseased, changes associated with disease, such as 
transformation, often occur in membrane proteins. 

The collections contain sets of member capture compounds. In 
general members of each set differs differ in at least one functional 
group, and generally in two or three, from members of the other sets. 
Thus, for example, if the compounds include a reactivity function, a 
selectivity function and a sorting function, each set differs in at least the 
sorting function, typically in at least in the sorting and selectivity 
function, and generally in all three functions. The solubility functions, if 
present, which are selected to permit assaying in a selected environment, 
can differ among the compounds, or can be the same among all sets. 

In practicing methods, the collections are contacted with a sample 
or partially purified or purified components thereof to effect binding of 
biomolecules to capture compounds in the collection. The capture 
compounds can be in an addressable array, such as in a bound to a solid 
support prior to contacting, or can be arrayed after contacting with the 
sample. The resulting array is optionally treated with a reagent that 
specifically cleaves the bound polymers, such as a protease, and is 
subjected to analysis, particularly mass spectrometry analysis to identify 
components of the bound biomolecules at each locus. Once a molecular 
weight of a biomolecule, such as a protein or portion thereof of interest is 
determined, the biomolecule can be identified. Methods for identification 
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include comparison of the molecular weights with databases, for example 
protein databases that iclude protease fragments and their molecular 
weights. 

The capture compounds that include functional groups that confer 
5 reactivity, selective and separative properties, depending on the 

specificity of separation and analysis required (which depends on the 
complexity of the mixture to be analyzed). As more functional groups are 
added to the compounds, the compounds can exhibit increased selectivity 
and develope a signature for target molecules similar to a an antigen (Ag) 
10 binding site on an antibody. In general, the compounds provided herein 
include at least two functional groups (functions) selected from four 
types of functions: a reactivity function, which binds to biolopolymers 
either covalently or with a high k a (generally greater than about 10 9 , 10 10 , 
10 12 liters/mole and/or such that the binding is substantially irreversible or 
15 stable under conditions of mass spectrometric analyses, such as MALDI- 
MS conditions); a selectivity function, which by virtue of non-covalent 
interactions alters, generally increases, the specificity of the reactivity 
function; a sorting function, which permits the compounds to be 
addressed (arrayed or otherwise separated based according to the 
20 structure of the capture compound; and a solubility function, which is 
selected alters the solubility of the compounds depending upon the 
environment in which reactions are performed, permitting the conditions 
to simulate physiological conditions. In general, the reactivity function 
that is reative group that specifically interacts, tyically covalently or with 
25 high binding affinity (k a ), with particular biomolecules, such as proteinsm, 
or portions thereof,; and the other functionality, the selectivity functions, 
alters, typically increasing, the specificity of the reactivity function. In 
general, the reactive function covalently interacts with groups on a 
particular biomolecule, such amine groups on the surface of a protein. 
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The reactivity function interacts with biomolecules to form a covalent 
bond or a non-covalent bond that is stable under conditions of analyis, 
generally with a k a of greater than 10 9 liters/mole or greater than 10 10 
liters/mole. Conditions of analysis include, but are not limited to, mass 
5 spectrophotometric analysis, such as matrix assisted laser desorption 
ionization-time of flight (MALDI-TOF) mass spectrometry. The selectivity 
function influences the types of biomolecules that can interact with the 
reactivity function through a non-covalent interaction. The selectivity 
function alters the specificity for the particular groups, generally reducing 
10 the number of such groups with which the reactivity functions reacts. A 
goal is to reduce the the number of proteins or biomolecules bound at a 
locus, so that the proteins which can then be separated, such as by mass 
spectrometry. - 

Included among the capture compounds provided herein are those 
15 that can , the compounds for use in the methods herein can be classified 
in at least two sets: one for reactions in aqueous solution (e.g. , for 
reaction with hydrophilic biomolecules), and the other for reaction in 
organic solvents (e.g., chloroform) (e.g., for reaction with hydrophobic 
biomolecules). Thus, in certain embodiments, the compounds provided 
20 herein discriminate between hydrophilic and hydrophobic biomolecules, 
including, but not limited to, proteins, and allow for analysis of both 
classes of biomolecules. 
C. Capture Compounds 

Capture compounds (also referred to as capture agents) are 
25 provided. The capture compounds include a core "Z" that presents one 
or more reactivity functions "X" and optionally at least a selectivity 
function "Y" and/or a sorting function "Q n , and also optionally one or 
more solubility functions "W." Additionally, cleavable linkers and other 
functions are included in the molecules. The particular manner in which 
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the functions are presented on the core or scaffold is a matter of design 
choice, but are selected such that the resulting molecule has the property 
that it captures biomolecules, particularly proteins, with sufficient 
specificity and either covalently or with bonds of sufficient stablity or 
5 affinity to permit analysis, such as by mass spectrometry, including 
MALDI mass spectrometric analysis, so that at least a portion of bound 
biomolecules remain bound (generally a binding affinity of 10 9 , 10 10 , 10 11 
liters/mole or greater, or a of 10 9 , 10 10 , 10 11 , 10 12 or greater). 

X, the reactivity functionality is selected to be anything that forms 

10 such a covalent bond or a bond of high affinity that is stable under 

conditions of mass spectrometric analysis, particularly MALDI analysis. 
The selectivity functionality Y, is a group that "looks" at the topology of 
the protein around reactivity binding sites and functions to select 
particular groups on biolmolecules from among those with which a ■ 

15 reactivity group can form a covalent bond (or high affinity bond). For 

example a selectivity group can cause steric hindrance, or permit specific 
binding to an epitope, or anything in between. It can be a substrate for a 
drug, lipid, peptide. It selects the environment of the groups with which 
the reactivity function interacts. The selectivity functionality Y, can be 

20 one whereby a capture compound forms a covalent bond with a bio- 
molecule in a mixture or interacts with high stabilty such that the affinity 
of binding of the capture compound to the biomolecule through the 
reactive functionality in the presence of the selectivity functionality is at 
least ten-fold or 100-fold greater than in the absence of the selectivity 

25 functionality. 

Q is a sorting function that can be anything that provides a means 
for separating each set of capture compounds from the others, such as 
by arraying, and includes, groups such as biotin, generally a spacer, 
binding to an avidin on a surface (or vice versa) arraying, oligonucleotides 
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for binding oligonucleotide arrays, any molecule that has a cognate 
binding partner to which it binds with sufficient affintity to survive mass 
spectrometric analysis, such as MALDI-MS analysis, can be selected. For 
any collection a variety of different sorting groups can be used; each set 
5 of capture compounds should have unique Q compared to the other sets. 
In addition, labeling means that can be sorted by virtue of the label, such 
as RF tags, fluroescent tags, color-coded tags or beads, bar-coded or 
other symboloby labeled tags and other such labels can be used. For 
example, the capture compounds or the X, Y, Z, W functionalities can be 

10 on a surface that is attached to an RF tag or a colored tag. These can be 
readily sorted after reaction so that each set can be separately analyzed 
to identify bound biomolecules. Thus, the collections can include capture 
compounds that have a variety of sorting groups. 

The solubility function, W, permits alteration in properties of the 

15 capture compound components of the collection. For example, W can be 
selected so that the capture compounds are soluble or not in a particular 
reaction medium or environment, such as a hydrophobic environment, 
thereby permitting reactions with membrane components. The 
collections include sets of capture compounds, each of which set differs 

20 in Q and at least one or both X and Y. 

As noted, among the capture compounds provided are those with 
at least three functionalities: reactivity and sorting and solubility. The 
sorting function can be selectively cleavable to permit its removal. These 
compounds also can include a selectivity function to alter the range of 

25 binding of the reactivity function, which binds either covalently or with 
high affinity (k a greater than 10 9 to biomolecules, and optionally one or 
both of a sorting and solubility function. 

More detailed description and discussion of each functionality and 
non-limiting exemplary embodiments follow. 
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1 . Z, the Core 

Generally all compounds include a function, even if it is one atom, 
such as carbon, for presenting the functional groups. In certain 
embodiments herein, in the capture compounds for use in the methods 
5 provided herein, Z is a moiety that is cleavable prior to or during analysis 
of the biomolecule, including mass spectral analysis, without altering the 
chemical structure of the biomolecule, including, but not limited to, a 
protein. 

For example, in some embodiments, the methods provided herein 

10 include a step of mass spectral analysis of biomolecules, including 
proteins, which are displayed in an addressable format. In certain 
embodiments, the compounds are then bound to an array of single 
oligonucleotides that include single-stranded portions (or portions that cna 
be made single-stranded) that are complementary to the oligonucleotide 

15 portions, or oligonucleotide analog portions, (Q, the sorting function) of 
the capture compounds. In these embodiments, Z can be selected to be 
a group that is (i) stable to the reaction conditions required for reaction of 
the compounds provided herein with the biomolecule, such as a protein, 
<ii) stable to the conditions required for hybridization of the Q moiety with 

20 the single stranded oligonucleotides, and (iii) cleavable prior to or during 
analysis of the biomolecule. 

In another embodiment, Z with the linked functional groups can be 
designed so that with the Q, X, W and/or Y it dissolved into lipid bilayers 
of a cell membrane, thereby contacting internal portions of cell membrane 

25 proteins through the X and Y functions. In this embodiment, the support 
captures proteins, such as membrane proteins and organell proteins, 
including proteins within cell membranes. The capture compounds and 
functional group can be selected so that the resulting capture compounds 
function under selected physiological conditions. Thus, the choice of Z, 
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Q, X, W and/or Y allows for design of surfaces and supports that mimic 
cell membranes and pther biological membranes. 

In some embodiments, a lipid bilayer, such as as those used for 
forming liposomes and other micelles, can be provided on the surface of a 
5 support as a way of maintaining the structures of membrane proteins to 
make a lipid bilayer on the surface. This can be employed where the 
support is the "Z" function and the other functions are linked thereto, or 
where the compounds are linked to a support through a Q group, such as 
by double-stranded oligonucleotides. The resulting immobilized capture 

10 compounds can be coated with or dissolved in a lipid coating. As are 
result compounsd and collections provided herein can act as an artificial 
membrane, dendrimer polymer chemistry can be employed for controlled 
synthesis of membranes having consistent pore dimensions and 
membrane thicknesses, through synthesis of amphiphilic dendrimeric or 

15 hyperbranched block copolymers that can be self-assembled to form 

ultrathin organic film membranes on porous supports. In one 

embodiment, an organic film membrane is composed of a linear-dendritic 

diblock copolymer composed of polyamidoamine (PAMAM) dendrimer 

attached to one end of a linear polyethylene oxide (PEO) block. 

20 Z is cleavable under the conditions of mass 

spectrometric analysis 

In one such embodiment, Z is a photocleavable group that is 

cleaved by a laser used in MALDI-TOF mass spectrometry. In another 

embodiment, Z is an acid labile group that is cleaved upon application of 

25 a matrix for mass spectrometric analysis to arrayed, such as hybridized 

compound-biomolecule conjugates, or by exposure to acids (e.g., 

trifluoroacetic or hydrochloric acids) in a vapor or liquid form, prior to 

analysis. In this embodiment, the matrix maintains the spacial integrity of 

the array, allowing for addressable analysis of the array. 
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Z is not cleavable under the conditions of mass 
spectrometric analysis 

In certain embodiments, the capture compounds for use in the 

methods provided herein have a Z moiety that is not cleavable under 

5 conditions used for analysis of biomolecules, including, but not limited to, 

mass spectrometry, such as matrix assisted laser desorption ionization- 

time of flight (MALDI-TOF) mass spectrometry. Capture ompounds of 

these embodiments can be used, for example, in methods provided herein 

for identififying biomolemolecules in mixtures thereof, for determining 

10 biomolecule-biomolecule, including protein-protein, interactions, and for 
determining biomolecule-small molecule, including protein-drug or protein- 
drug candidate, interactions. In these embodiments, it is not necessary 
for the Z group to be cleaved for the analysis. 

Thus, as noted, Z can be virtually any moiety that serves as a core 

15 to present the binding (the selectivity and reactivity functions) and the 
solubility and sorting functions. A variety are exemplified herein, but 
others may be substituted. The precise nature can be a matter of design 
choice in view of the disclosure herein and the skill of the skilled artisan 

a. Divalent Z moieties 

20 In one embodiment, Z is a cleavable or non-cleavable divalent 

group that contains, gnerally 50 or fewer, or less than 20 members, and 
is selected from straight or branched chain alkylene, straight or branched 
chain alkenylene, straight or branched chain alkynylene, straight or 
branched chain alkylenoxy, straight or branched chain alkylenthio, straight 

25 or branched chain alkylencarbonyl, straight or branched chain 
alkylenamino, cycloalkylene, cycloalkenylene, cycloalkynylene, 
cycloalkylenoxy, cycloalkylenthio, cycloalkylencarbonyl, 
cycloalkylenamino, heterocyclylene, arylene, arylenoxy, arylenthio, 
arylencarbonyl, arylenamino, heteroarylene, heteroarylenoxy, 
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heteroarylenthio, heteroarylencarbonyl, heteroarylenamino, oxy, thio, 
carbonyl, carbonyloxy, ester, amino, amido, phosphino, phosphineoxido, 
phosphoramidato, phosphinamidato, sulfonamido, sulfonyl, sulfoxide 
carbamate ureido, and combinations thereof, and is optionally 
5 substituted with one or more, including one, two, three or four, 

substituents each independently selected from Y, as described elsewhere 
herein. 

In other embodiments, Z is a divalent cleavable or non-cleavable 
group selected from straight or branched chain alkyl, straight or branched 

10 chain alkenyl, straight or branched chain alkynyl, -(C(R 15 ) 2 ) d -, -0-, -S-, 
-(CH 2 ) d ^ -(CH 2 ) d O-, -(CH 2 ) d S-, >N(R 15 ), -<S(0) U )-, -(S(0) 2 ) w -, >C(0), 
-(C(0)) w -, -(C(S(0) u )) w -, -<C(0)0) W -, -(C(R 15 ) 2 ) d O-, -(C(R 15 ) 2 ) d S(0) u -, 
-0{C(R 15 ) 2 ) d -, -StO^R 15 )^-, -{C(R 15 ) 2 ) d O(C(R 15 ) 2 ) d -, 
-<C(R 15 ) 2 ) d S(0) u (C(R 15 ) 2 ) d -, -N(R 15 )(C(R 15 ) 2 ) d - / -(C(R 15 ) 2 ) d NR 15 -, 

15 -(C(R 15 ) 2 ) d N(R l5 )(C(R 15 ) 2 ) d -, -(S(R 15 )(O u ) w -, -(C(R 15 ) 2 ) d -, 

-(C<R 15 ) 2 ) d O(C{R 15 ) 2 ) d -, -(C(R" 15 ) 2 ) d (C(0)0) w (C(R 15 ) 2 ) d -, -(C(0)0)JC(R 15 ) 2 ) d -, 
-(C(R 15 ) 2 ) d (C(0)0) w -, -(C(S)(R 15 ) W -, -(C(0)) w (CR 15 2 ) d ~, 
-(CR l5 ) d (C(0))JCR 15 ) d -, -(C(R 15 ) 2 ) d (C<0)) w -, -N(R 15 )(C<R 15 ) 2 ) W -, 
-OC(R 15 ) 2 C(0)-, -0((R 15 ) 2 C(0)N{R 15 )-, -(C(R 15 ) 2 ) W N(R ,5 )(C(R 15 ) 2 ) W -, 

20 -(C(R 15 ) 2 ) W N(R 15 )-, >P(0) v (R 15 ) x , >P(0) u (R 15 ) 3 , >P(0> u (C(R 15 ) 2 ) d , >Si(R 15 ) 2 
and combinations of any of these groups; 

where u, v and x are each independently 0 to 5; 
each d is independently an integer from 1 to 20, or 1 to 12, or 1-6, 
or 1 to 3; 

25 each w is independently an integer selected from 1 to 6, or 1 to 3, 

or 1 to 2; and 

each R 15 is independently a monovalent group selected from 
straight or branched chain alkyl, straight or branched chain alkenyl, 
straight or branched chain alkynyl, cycloalkyl, cycloalkenyl, cycloalkynyl, 
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heterocyclyl, straight or branched chain heterocyclylalkyl, straight or 
branched chain heterocyclylalkenyl, straight or branched chain 
heterocyclylalkynyl, aryl, straight or branched chain arylalkyl, straight or 
branched chain arylalkenyl, straight or branched chain arylalkynyl, 
5 heteroaryl, straight or branched chain heteroarylalkyl, straight or branched 
chain heteroaryialkenyl, straight or branched chain heteroarylalkynyl, halo, 
straight or branched chain haloalkyl, pseudohalo, azido, cyano, nitro, 
OR 60 , NR 60 R 61 , COOR 60 , C(0)R 60 , C{0)NR 60 R 61 , S{0) q R 60 , S(0) q OR 60 , 
S(O) q NR 60 R 61 , NR 60 C(O)R 61 , NR 60 C{0)NR 60 R 61 / NR 60 S(0) q R 60 , SiR 60 R 6, R 62 , 

10 P(R 60 ) 2 , P{O)(R 60 ) 2/ P(OR 60 ) 2/ P(O)(OR 60 ) 2 , P{O)(OR 60 )(R 61 ) and P(O)NR 60 R 61 , 
where q is an integer from 0 to 2; 

each R 60 , R 61 , and R6 2 is independently hydrogen, straight or 
branched chain alkyl, straight or branched chain alkenyl, straight or 
branched chain alkynyi, aryl, straight or branched chain aralkyl, straight or k 

15 branched chain aralkenyl, straight or branched chain aralkynyl, heteroaryl, 
straight or branched chain heteroaralkyl, straight or branched chain 
heteroaralkenyl, straight or branched chain heteroaralkynyl, heterocyclyl, 
straight or branched chain heterocyclylalkyl, straight or branched chain 
heterocyclylalkenyl or straight or branched chain heteorcyclylalkynyl. 

20 In other embodiments, Z is a cleavable or non-cleavable divalent 

group having any combination of the following groups: arylene, 
heteroarylene, cycloalkylene, >C{R 15 ) 2 , -C(R 15 ) = C(R 15 )-, >C = C(R 23 )(R 24 ), 
>C(R 23 )(R 24 ), -C^C-, -O-, >S(A) U , >P(D) V (R 15 ), > P{D) V (ER 15 ), >N(R 15 ), 
>N + (R 23 )(R 24 ), >Si(R 15 ) 2 or >C(E); where u is 0, 1 or 2; v is 0, 1, 2 or 3; 

25 A is -O- or -NR 15 ; D is -S- or -O-; and E is -S-, -O- or -NR 15 ; that groups 
can be combined in any order; 

each R 15 is a monovalent group independently selected from the 
group consisting of hydrogen and V-R 18 ; 

each V is a divalent group independently having any combination of 
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the following groups: a direct link, arylene, heteroarylene, cycloalkylene, 
>C(R 17 ) 2 , -C{R 17 ) = C(R 17 )-, >C = C(R 23 ){R 24 ), > C(R 23 )(R 24 ), -C = C-, -O-, 
>S(A) U/ >P(D) V (R 17 ), >P(D) V {ER 17 ), >N(R 17 ), >N(COR 17 ), > N + (R 23 ){R 24 ), 
>Si(R 17 ) 2 and >C(E); where u is 0, 1 or 2; v is 0, 1, 2 or 3; A is -O- or 
5 -NR 17 ; D is -S- or -O-; and E is -S-, -O- or -NR 17 ; that groups can be 
combined in any order; 

R 17 and R 18 are each independently selected from the group 
consisting of hydrogen, halo, pseudohalo, cyano, azido, nitro, 
-SiR 27 R 28 R 25 , alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, aralkyl, 
10 aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 

heteroaralkynyl, heterocyclyl, heterocyciylalkyl, heterocyclylalkenyl, 
heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
and -NR 19 R 20 ; 

R 19 and R 20 are each independently selected from hydrogen, alkyl, 
15 alkenyl, alkynyl, cycloalkyl, aryl, aralkyl, heteroaryl, heteroaralkyl and 
heterocyclyl; 

R 23 and R 24 are selected from (i) or (ii) as follows: 
(i) R 23 and R 24 are independently selected from the group consisting 
of hydrogen, alkyl, alkenyl, alkynyl, cycloalkyl, aryl and heteroaryl; or 
20 (ii) R 23 and R 24 together form alkylene, alkenylene or cycloalkylene; 

R 25 , R 27 and R 28 are each independently a monovalent group 
selected from hydrogen, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, 
aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 
heteroaralkynyl, heterocyclyl, heterocyclylalkyl, heterocyclylalkenyl, 
25 heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
and -NR 19 R 20 ; 

R 15 , R 17 , R 18 , R 19 , R 20 , R 23 , R 24 , R 25 , R 27 and R 28 can be substituted 
with one or more substituents each independently selected from Z 2 , in 
that Z 2 is selected from alkyl, alkenyl, alkynyl, aryl, cycloalkyl, 
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cycloalkenyl, hydroxy, -S(0) h R 35 in that h is O, 1 or 2, -NR 35 R 36 , -COOR 35 , 
-COR 35 , -CONR 35 R 36 , -OC«D)NR 35 R 36 , -N(R 35 )C{0)R 36 / alkoxy, aryloxy, 
heteroaryl, heterocyclyl, heteroaryloxy, heterocyclyloxy, aralkyl, aralkenyl, 
aralkynyl, heteroaralkyl, heteroaralkenyl, heteroaralkynyl, aralkoxy, 
5 heteroaralkoxy, alkoxycarbonyl, carbamoyl, thiocarbamoyl, alkoxycar- 
bonyl, carboxyaryl, halo, pseudohalo, haloalkyl and carboxamido; 

R 35 and R 36 are each independently selected from among hydrogen, 
halo, pseudohalo, cyano, azido, nitro, trialkylsilyl, dialkylarylsilyl, 
alkyldiarylsilyl, triarylsilyl, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, 

10 aryl, aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, 

heteroaralkenyl, heteroaralkynyl, heterocyclyl, heterocyclylalkyl, 
heterocyclylalkenyl, heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, 
aralkoxy, heteroaralkoxy, amino, amido, alkylamino, dialkylamino, 
alkylarylamino, diarylamino and arylamino. 

15 In certain embodiments herein, the compounds are selected with 

the proviso that Z is cleavable prior to or during analysis, including mass 
spectral analysis, such as matrix assisted laser desorption ionization-time 
of flight (MALDI-TOF) mass spectrometry, of the biomolecule. 

In certain embodiments, Z is at least a trivalent moiety selected 

20 from the divalent moieties disclosed herein absent at least one hydrogen. 
The capture compounds in the collections provided herein include a core 
Z that has a variety of valencies. Among the capture compounds are 
those in which Z is at least trivalent. Also among the compounds in the 
collections are those where Z is divalent and linked to either a Q and an 

25 X, or a Q and a Y, or an X and a Y, or other combination of the 
functionalities provided herein. 
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(i) Cleavable divalent Z moieties 
In one embodiment, Z is a cleavable divalent moiety and has the 
formula: -(S 1 ) t -M(R 15 ) a -(S 2 ) b -L-, 

where S 1 and S 2 are spacer moieties; t and b are each independently 0 or 
5 1; M is a central moiety possessing two or more points of attachment 
(i.e., divalent or higher valency); in certain embodiments, two to six 
points of attachment (i.e. , divalent to hexavalent), in other embodiments, 
2, 3, 4 or 5 points of attachment (i.e., divalent, trivalent, tetravalent or 
pentavalent); R 15 is as described above; a is 0 to 4, in certain 
10 embodiments, 0, 1 or 2; and L is a bond that is cleavable prior to or 
during analysis, including mass spectral analysis, of a biomolecule 
without altering the chemical structure of the biomolecule, such as a 
protein. i 

(a) M 

15 In certain embodiments, M is alkylene, phenylene, biphenylene or a 

divalent heterobifunctional trityl derivative. M is unsubstituted or is 
substituted with 1 to 4 groups, each independently selected from R 15 . 

In other embodiments, M is selected from -(CH 2 ) r -, -(CH 2 0) r -, 
-(CH 2 CH 2 -0) r -, -<NH-(CH 2 ) r -C( = 0)) 3 -, -(NH-CH(R 52 )-C( = 0)) r -, 

20 -(O-(CH) r -C( = 0)) s -, 



25 



30 
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where R 15 is as defined above; r and s are each independently an integer 
from 1 to 10; R 52 is the side chain of a natural a-amino acid; and z is an 
integer from 1 to 4. In one embodiment, z is 1 . 

In certain embodiments, R 15 is -H, -OH, -OR 51 , -SH, -SR 51 , -NH 2 , 
5 -NHR 51 , -N(R 51 ) 2 , -F, -CI, -Br, -I, -S0 3 H, -PO 2 4 , -CH 3 , -CH 2 CH 3 , -CH{CH 3 ) 2 
or -C(CH 3 ) 3 ; where R 51 is straight or branched chain alkyl, straight or 
branched chain alkenyl, straight or branched chain alkynyl, aryl, 
heteroaryl, cycloalkyl, heterocyclyl, straight or branched chain aralkyl, 
straight or branched chain aralkenyl, straight or branched chain aralkynyl, 
10 straight or branched chain heteroaralkyl, straight or branched chain 

heteroaralkenyl, straight or branched chain heteroaralkynyl, straight or 
branched chain cycloalkylalkyl, straight or branched chain 
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cycloalkylalkenyl, straight or branched chain cycloalkylalkynyl, straight or 
branched chain heterocyclylalkyl, straight or branched chain 
heterocyclylalkenyl or straight or branched chain heterocyclylalkynyl. 

(b) S 1 and S 2 

5 Optionally, a spacer region S 1 and/or S 2 can be present on either or 

both sides of the central moiety M (linked to Z) of the compounds, for 
example, to reduce steric hindrance in reactions with the surface of large 
biomolecules and/or for facilitating sorting. These can be any groups that 
provide for spacing, typically without altering desired functional properties 

10 of the capture compounds and/or capture compound/biomolecule com- 
plexes. Those of skill in the art in light of the disclosure herein, can 
readily select suitable spacers. Exemplary spacers are set forth below. 

For embodiments, for example, where the biomolecule and the 
sorting function possess low steric hinderance, a spacer is optional. In 

15 certain embodiments, steric hindrance also can enhance selectivity in 

conjunction with Y (or in the absence of a Y). This enhanced selectivity 
can be achieved either by the presence of a selectivity function, Y, that is 
attached to M or by the selection of the appropriate spacer molecules for 
S 1 and/or S 2 . 

20 If S 2 is not required, the reactivity of the cleavable bond L can be 

influenced by one or more substituted functionalities, for example, R 15 on 
M. Electronic (e.g., mesomeric, inductive) and/or steric effects can be 
used to modulate the stability of the cleavable bond L. For example, if M 
is a trityl derivative, the linkage to the biomolecule, including, but not 

25 limited to, a protein, is in one embodiment a trityl ether bond. The 

sensitivity of this bond to mild acids, such as acetic acid or the vapor of 
trifluoroacetic acid, can be significantly enhanced by having as R 15 one or 
two electron donating groups, including, but not limited to, alkoxy 
groups, such as methoxy groups, in the para positions of the aryl rings. 
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Alternatively, the trityl ether bond can be stabilized by the introduction of 
electron withdrawing groups, including, but not limited to, either halogen, 
including bromo and chloro, groups, nitro groups or ester moieties, in the 
para and/or ortho positions of the aromatic rings. 
5 In certain embodiments, S 1 and S 2 are each independently selected 

from -<CH 2 ) r -, -(CH 2 0)-, -(CH 2 CH 2 -0) r -,-(NH-(CH 2 ) r -C( = 0)) s -, 



-(NH-CH(R 52 )-C( = O)) 5 -,-(O-(CH) r -C(-0)) s -, 




where R 15 is selected as above; r and s are each independently an integer 
from 1 to 10; R 52 is the side chain of a natural a-amino acid; and y is an 

10 integer from 0 to 4. In one embodiment, y is O or 1 . 

In certain embodiments, R 15 is -H, -OH, -OR 51 , -SH, -SR 51 , -NH 2 , 
-NHR 51 , -NR 51 2 , ~F, -CI, -Br, -I, -S0 3 H, -PO" 2 4 , -CH 3 , -CH 2 CH 3 , -CH(CH 3 ) 2 or 
-C(CH 3 ) 3 ; where R 51 is straight or branched chain alkyl, straight or 
branched chain alkenyl, straight or branched chain alkynyl, aryl, 

15 heteroaryl, cycloalkyl, heterocyclyl, straight or branched chain aralkyl, 

straight or branched chain aralkenyl, straight or branched chain aralkynyl, 
straight or branched chain heteroaralkyl, straight or branched chain 
heteroaralkenyl, straight or branched chain heteroaralkynyl, straight or 
branched chain cycloalkylalkyl, straight or branched chain 

20 cycloalkylalkenyt, straight or branched chain cycloalkylalkynyl, straight or 
branched chain heterocyclylalkyl, straight or branched chain 
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heterocyclylalkeny! or straight or branched chain heterocyclylalkynyl. 

(c) L 

In certain embodiments, the cleavable group L is cleaved either 
prior to or during analysis of the biomolecule, such as a protein . The 
5 analysis can include mass spectral analysis, for example MALDI-TOF 
mass spectral analysis. The cleavable group L is selected so that the 
group is stable during conjugation to a biomolecule, and sorting, such as 
hybridization of a single stranded oligonucleotide Q moiety to a 
complementary sequence, and washing of the hybrid; but is susceptable 

10 to cleavage under conditions of analysis of the biomolecule, including, but 
not limited to, mass spectral analysis, for example MALDI-TOF analysis. 
In certain embodiments, the cleavable group L can be a disulfide moiety, 
created by reaction of the compounds where X = -SH, with the thiol side 
chain of cysteine residues on the surface of biomolecules, including, but 

15 not limited to, proteins. The resulting disulfide bond can be cleaved 

under various reducing conditions including, but not limited to, treatment 
with dithiothreitol and 2-mercaptoethanol. 

In another embodiment, L is a photocleavable group, which can be 
cleaved by a short treatment with UV light of the appropriate wave length 

20 either prior to or during mass spectrometry. Photocleavable groups, 
including those bonds that can be cleaved during MALDI-TOF mass 
spectrometry by the action of a laser beam, can be used. For example, a 
trityl ether or an ortho nitro substituted aralkyl, including benzyl, group 
are susceptible to laser induced bond cleavage during MALDI-TOF mass 

25 spectrometry. Other useful photocleavable groups include, but are not 
limited to, o-nitrobenzyl, phenacyl, and nitrophenylsulfenyl groups. 

Other photocleavable groups for use herein include those disclosed 
in International Patent Application Publication No. WO 98/20166. In one 
embodiment, the photocleavable groups have formula I: 
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R O 



(R ) t 



NO 



(I) 



R 21 O 



where R 2 ° is af-O-alkylene-; R 21 is selected from hydrogen, alkyl, aryl, 
alkoxycarbonyl, aryloxycarbonyl and carboxy; t is 0-3; and R 50 is alkyl, 
alkoxy, aryl or aryloxy. In one embodiment, Q is attached to R 20 through 
(S 1 ) t -M(R l5 ) a -(S 2 ) b ; and the biomolecule of interest is captured onto the 
R 21 CH-0- moiety via a reactive derivative of the oxygen {e.g., X). 

In another embodiment, the photocleavable groups have formula II: 




NO 



(ID 



30 where R 20 is a;-0-alkylene- or alkylene; R 21 is selected from hydrogen, 
alkyl, aryl, alkoxycarbonyl, aryloxycarbonyl and carboxy; and X 20 is 
hydrogen, alkyl or OR 21 . In one embodiment, Q is attached to R 20 
through (S 1 ) t -M(R 15 ) a -(S 2 ) b ; and the biomolecule of interest is captured 
onto the R 21 CH-0- moiety via a reactive derivative of the oxygen (e.g. t 

35 X). 



In further embodiments, R 20 is -0-(CH 2 ) 3 - or methylene; R 21 is 



selected from hydrogen, methyl and carboxy; and X 20 is hydrogen, methyl 
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or OR 21 . In another embodiment, R 21 is methyl; and X 20 is hydrogen. In 
certain embodiments, R 20 is methylene; R 21 is methyl; and X 20 is 3-(4,4'- 
dimethoxytrityloxy)propoxy. 

In another embodiment, the photocleavable groups have formula III 




(III) 



where R 2 is selected from a/-0-alkylene-0 and ay-O-alkylene-, and is 
unsubstituted or substituted on the alkylene chain with one or more alkyl 
groups; c and e are each independently 0-4; and R 70 and R 71 are each 
independently alkyl, alkoxy, aryl or aryloxy. In certain embodiments, R 2 is 
oz-O-alkylene-, and is substituted on the alkylene chain with a methyl 
group. In one embodiment, Q is attached to R 2 through 
(S 1 ) t -M(R 15 ) a -(S 2 ) b ; and the biomolecule of interest is captured onto the 
Ar 2 CH-0- moiety via a reactive derivative of the oxygen (e.g., X). 

In further embodiments, R 2 is selected from 3-0-(CH 2 ) 3 -0, 
4-0-(CH 2 )*- / 3-0-(CH 2 ) 3 -, 2-0-CH 2 CH 2 -, -OCH 2 -, 

Me 

o\ — 



Me 



and 



X 




35 



In other embodiments, c and e are 0. 

Other cleavable groups L include acid sensitive groups, where bond 
cleavage is promoted by formation of a cation upon exposure to mild to 
strong acids. For these acid-labile groups, cleavage of the group L can be 
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effected either prior to or during analysis, including mass spectrometric 
analysis, by the acidity of the matrix molecules, or by applying a short 
treatment of the array with an acid, such as the vapor of trifluoroacetic 
acid. Exposure of a trityl group to acetic or trifluoroacetic acid produces 
5 cleavage of the ether bond either before or during MALDI-TOF mass 
spectrometry. 

The capture compound-biomolecule array can be treated by either 
chemical, including, but not limited to, cyanogen bromide, or enzymatic, 
including, but not limited to, in embodiments where the biomolecuie is a 

10 protein, trypsin, chymotrypsin, an exopeptidase (e.g., aminopeptidase and 
carboxypeptidase) reagents to effect cleavage. For the latter, all but one 
peptide fragment will remain hybridized when digestion is quantitative. 
Partial digestion also can be of advantage to identify and characterize 
proteins following desorption from the array. The cleaved protein/peptide 

15 fragments are desorbed, analyzed, and characterized by their respective 
molecular weights. 

In certain embodiments herein, L is selected from -S-S-, 
-0-P( = 0)(OR 51 )-NH-, -0-C( = 0)-, 

20 



25 



30 



35 
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where R 15 , R 51 and y are as defined above. In certain embodiments, R 15 
is -H, -OH, -OR 51 , -SH, -SR 51 , -NH 2 , -NHR 51 , -N(R 51 ) 2 , -F, -CI, -Br, -I, 
-SO3H, -PO" 2 4 , -CH 3 , -CH 2 CH 3 , -CH(CH 3 ) 2 or -C(CH 3 ) 3 ; where R 51 is straight 

25 or branched chain alkyl, straight or branched chain alkenyl, straight or 
branched chain alkynyl, aryl, heteroaryl, cycloalkyl, heterocyclyi, straight 
or branched chain aralkyl, straight or branched chain aralkenyl, straight or 
branched chain aralkynyl, straight or branched chain heteroaralkyl, 
straight or branched chain heteroaralkenyl, straight or branched chain 

30 heteroaralkynyl, straight or branched chain cycloalkylalkyl, straight or 
branched chain cycloalkylalkenyl, straight or branched chain 
cycloalkylalkynyl, straight or branched chain heterocyclylalkyl, straight or 
branched chain heterocyclylalkenyl or straight or branched chain heterocyclylalkynyl 
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(ii) Non-cleavable divalent Z moieties 
In another embodiment, Z is a non-cleavable divalent moiety and 
has the formula: -(S l ) r M(R 15 ) a -(S 2 ) b -, 
where S 1 , M, R 15 , S 2 , t, a and b are as defined above. 
5 b. Z has a dendrimeric structure 

In another embodiment, Z has a dendritic structure {i.e., Z is a 
multivalent dendrimer) that is linked to a plurality of Q and X moieties. Z, 
in certain embodiments, has about 4 up to about 6, about 8, about 10, 
about 20, about 40, about 60 or more points of attachment (i.e., Z is 
10 tetravalent up to hexavalent, octavalent, decavalent, didecavalent, 
tetradecavalent, hexadecavalent, etc.). In these embodiments, the 
dendritic moiety Z is based on a multivalent core M, as defined above. 
The number of points of attachment on M may vary from about 2 up to 
about 4, about 6, about 8, or more. Thus, in one embodiment, Z has the 
15 structure: 




where M is as defined above, and is linked to a plurality of Q, Y, W and X 
35 moieties. 
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In another embodiment, Z has the structure: 



10 



15 



20 



25 



30 



35 




where M is as defined above, and is linked to a plurality of Q, Y, W and X 
moieties. 

In other embodiments, the dendritic Z moieties may optionally 
possess a pluratlity of spacer groups S 1 and/or S 2 , or for embodiments 
where Z is a cleavable linkage, a plurality of L groups. The S 1 , S 2 and/or 
L moieties are attached to the end of the dendritic chain(s). 

In these embodiments, the density of the biopolymer to be 
analyzed, and thus signal intensity of the subsequent analysis, is 
increased relative to embodiments where Z is a divalent group. 

c. Z is an insoluble support or a substrate 

In other embodiments, Z can be an insoluble support or a 
substrate, such as a particulate solid support, such as a silicon or other 
"bead" or microsphere, or solid surface so that the surface presents the 
functional groups (X, Y, Q and, as needed W). In these embodiments, Z 
has bound to it one or a plurality of X moieties (typically, 1 to 100, 
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generally 1 to 10) and optionally to at least one Q and/or Y moiety, and 
also optionally to one or more W moieties. Z, in these embodiments, can 
have tens up to hundreds, thousands, millions, or more functional 
moieties (groups) on its surface. For example, the capture compound can 
5 be a silicon particule or a agarose or other paricle with groups presented 
on it. As discussed below, it further can be coated with a hydrophobic 
material, such as lipid bilayers or other lipids that are used, for example to 
produce liposomes. In such embodiments, the resulting particles with a 
hydrophobic surface and optional hydrophobic W groups are used in 

10 methods for probing cell membrane environments and other intracellular 
environments. Gentle lysis of cells, can expose the intracellular 
compartments and organelles, and hydrophobic capture compounds, such 
as these, can be reacted with them, and the bound biomolecules 
assessed by, for example, mass spectrometry or further treated to release 

15 the contents of the compartments and organelles and reacted with the 
capture compounds or other capture compounds. 

In embodiments in which Z is an insoluble support, the insoluble 
support or substrate moiety Z can be based on a flat surface constructed, 
for example, of glass, silicon, metal, plastic or a composite or other 

20 suitable surface; or can be in the form of a "bead" or particle, such as a 
silica gel, a controlled pore glass, a magnetic or cellulose bead; or can be 
a pin, including an array of pins suitable for combinatorial synthesis or 
analysis. Substrates can be fabricated from virtually any insoluble or 
solid material. For example, silica gel, glass (e.g., controlled-pore glass 

25 (CPG)), nylon, Wang resin, Merrifield resin, dextran cross — linked with 
epichlorohydrin [e.g., Sephadex®), agarose [e.g., Sepharose 0 ), cellulose, 
magnetic beads, Dynabeads, a metal surface {e.g., steel, gold, silver, 
aluminum, silicon and copper), a plastic material [e.g., polyethylene, 
polypropylene, polyamide, polyester, polyvinylidenedifluoride (PVDF)) 
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Exemplary substrate include, but are not limited to, beads {e.g., silica gel, 
controlled pore glass, magnetic, dextran cross— linked with 
epichlorohydrin {e.g., Sephadex*), agarose {e.g., Sepharose*), cellulose, 
capillaries, flat supports such as glass fiber filters, glass surfaces, metal 
5 surfaces (steel, gold, silver, aluminum, copper and silicon), plastic 

materials including multiwell plates or membranes {e.g., of polyethylene, 
polypropylene, polyamide, polyvinylidenedifluoride), pins {e.g., arrays of 
pins suitable for combinatorial synthesis or analysis or beads in pits of flat 
surfaces such as wafers {e.g., silicon wafers) with or without plates. The 

10 solid support is in any desired form, including, but not limited to, a bead, 
capillary, plate, membrane, wafer, comb, pin, a wafer with pits, an array 
of pits or nanoliter wells and other geometries and forms known to those 
of skill in the art. Supports include flat surfaces designed to receive or 
link samples at discrete loci. 

15 In one embodiment, the solid supports or substrates Z are "beads" 

{i.e., particles, typically in the range of less than 200 jjm or less than 50 
//m in their largest dimension) including, but not limited to, polymeric, 
magnetic, colored, R f -tagged, and other such beads. The beads can be 
made from hydrophobic materials, including, but not limited to, 

20 polystyrene, polyethylene, polypropylene or teflon, or hydrophilic 

materials, including, but not limited to, cellulose, dextran cross— linked 
with epichlorohydrin {e.g., Sephadex 8 ), agarose {e.g., Sepharose®), 
polyacrylamide, silica gel and controlled pore glass beads or particles. 
These types of capture compounds can be reacted in liquid phase in 

25 suspension, and the spun down or other removed from the reaction 
medium, and the resulting complexes analyzed, such as by mass 
spectrometry. They can be sorted using the Q function to bind to distinct 
loci on a solid support, or they can include a label to permit addressing, 
such as an radio frequency tag or a colored label or bar code or other 
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symbology imprinted thereon. These can be sorted according to the 
label, which serves as "Q" function, and then analyzed by mass 
spectrometry. 

In further embodiments, the insoluble support or substrate Z 
5 moieties optionally can possess spacer groups S 1 and/or S 2 , or for 
embodiments where Z is a cleavable linkage, L. The S 1 , S 2 and/or L 
moieties are attached to the surface of the insoluble support or substrate. 

In these embodiments, the density of the biomolecule to be 
analyzed, and thus signal intensity of the subsequent analysis, is 

10 increased relative to embodiments where Z is a divalent group. In certain 
embodiments, an appropriate array of single stranded oligonucleotides or 
oligonucleotide analogs that are complementary to the single stranded 
oligonucleotide or oligonucleotide analog sorting functions Q will be 
employed in the methods provided herein. 

15 d. Mass modified Z moieties 

In other embodiments, including embodiments where Z is a 
cleavable moiety, Z includes a mass modifying tag. In certain 
embodiments, the mass modifying tag is attached to the cleavable linker 
L. In one embodiment, the mass modified Z moiety has the formula: 

20 -(S 1 ) r M(R l5 ) a -(S 2 ) b -L-T-, where S\ t, M, R 15 , a, S 2 , b and L are selected as 
above; and T is a mass modifying tag. Mass modifying tags for use 
herein include, but are not limited to, groups of formula -X 1 R 10 -, where X 1 
is a divalent group such as -O-, -0-C(0)-(CH 2 ) y -C(0)0-, -NH-C(O)-, 
-C(0)-NH-, -NH-C(0)-(CH 2 ) y -C{0)0-, -NH-C(S)-NH-, -0-P(0-alkyl)-0-, 

25 -0-S0 2 -0, -0-C(0)-CH 2 -S-, -S-, -NH- and 



30 
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O Me 

Me 



> 



N 



> 



O Me 

10 and R 10 is a divalent group including -(CH 2 CH 2 0) 2 -CH 2 CH 2 0-, 

-(CH 2 CH 2 0) 2 -CH 2 CH 2 0-alkyIene, alkylene, alkenylene, alkynylene, arylene, 
heteroarylene, -(CH 2 ) 2 -CH 2 -0-, -(CH 2 ) 2 -CH 2 -0-alkylene, 
-(CH 2 CH 2 NH) 2 -CH 2 CH 2 NH-, -CH 2 -CH(OH)-CH 2 0-, -Si(R 12 )(R 13 )-, -CHF- and 
-CF 2 -; where y is an integer from 1 to 20; z is an integer from 0 to 200; 

15 R 11 is the side chain of an or-amino acid; and R 12 and R 12 are each 
independently selected from alkyl, aryl and aralkyl. 

In other embodiments, -X 1 R 10 - is selected from -S-S-, -S-, 
-(NH-<CH 2 ) y -NH-C(O)-(CH 2 ) y -C(O)) 2 -NH-(CH 2 ) v -NH-C(O)-(CH 2 ) y -C(0)O-, 
-(NH-(CH 2 ) y -C(0)) z -NH-(CH 2 ) y -C{0)0- f 

20 -(NH-CHtR^J-CtO^-NH-CHCR^J-CCOJO-, and 
-(0-(CH 2 ) y -C(0)) r NH-(CH 2 ) y -C{0)0-. 

In the above embodiments, where R 10 is an oligo-/polyethylene 
glycol derivative, the mass-modifying increment is 44, i.e., five different 
mass-modified species can be generated by changing z from 0 to 4, thus 

25 adding mass units of 45 (z = 0), 89 (z = 1), 133 <z = 2), 177 (z = 3) 
and 221 (z = 4) to the compounds. The oligo/polyethylene glycols also 
can be monoalkylated by a lower alkyl such as methyl, ethyl, propyl, 
isopropyl, t-butyl and the like. 

Other mass modifying tags include, but are not limited to -CHF-, 

30 -CF 2 -, -Si(CH 3 ) 2 -, -Si(CH 3 )(C 2 H 5 )- and -Si(C 2 H 5 ) 2 . In other embodiments, 
the mass modifying tags include homo- or heteropeptides. A non-limiting 
example that generates mass-modified species with a mass increment of 
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57 is an oligoglycine, which produce mass modifications of, e.g., 74 (y = 
1 , z = O), 1 31 (y = 1 , z = 2), 1 88 (y = 1 , z = 3) or 245 (y = 1 , z = 
4). Oligoamides also can be used, e.g., mass-modifications of 74 (y = 1, 
z = 0), 88 (y = 2, z « 0), 102 (y = 3, z = 0), 1 1 6 (y = 4, z = 0), 
5 etc., are obtainable. Those skilled in the art will appreciate that there are 
numerous possibilities in addition to those exemplefied herein for 
introducing, in a predetermined manner, many different mass modifying 
tags to the compounds provided herein. 

In other embodiments, R 15 and/or S 2 can be functionalized with 
10 -X 1 R 10 H or -X 1 R 1 °-alkyl, where X 1 and R 10 are defined as above, to serve 
as mass modifying tags. 

2. Reactivity Functions "X" 

Reactivity functions {"X") confer the ability on the compounds the 
ability to bind either covalently or with a high affinity (greater than 10 9 , 

15 generally greater than 10 10 or 10 11 liters/mole, typically greater than a 

monoclonal antibody, and typically stable to mass spectrometric analysis, 
such as MALDI-MS) to a biomolecule, particularly proteins, including 
functional groups thereon, which include post-translationally added 
groups. Generally the binding is covalent or is of such affinity that it is 

20 stable under conditions of analysis, such as mass spectral, including 

MALDI-TOF, analysis. Exemplary groups are set forth herein (see, e.g., 
Figure 16, and the discussion below). 

In the compounds provided herein, X is a moiety that binds to or 
interacts with the surface of a biomolecule, including, but not limited to, 

25 the surface of a protein; an amino acid side chain of a protein; or an 
active site of an enzyme (protein) or to functional groups of other 
biomolecule, including lipids and polysaccharides. 

Thus, for example, X is a group that reacts or interacts with 
functionalities on the surface of a protein to form covalent or non- 
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covalent bonds with high affinity. A wide selection of different functional 
groups are available for X to interact with a protein. For example, X can 
act either as a nucleophile or an electrophile to form covalent bonds upon 
reaction with the amino acid residues on the surface of a protein. 
5 Exemplary reagents that bind covalently to amino acid side chains 

include, but are not limited to, protecting groups for hydroxyl, carboxyl, 
amino, amide, and thiol moieties, including, for example, those disclosed 
in T.W. Greene and P.G.M. Wuts, "Protective Groups in Organic 
Synthesis," 3rd ed. (1999, Wiley Interscience); photoreactive groups, 
10 Diels Alder couples {Le. f a diene on one side and a sngle double bond on 
the other side). 

Hydroxyl protecting groups for use as X groups herein include, but 

are not limited to: 

(i) ethers such as methyl, substituted methyl (methoxymethyl, 

15 methylthiomethyl, (phenyldimethylsilyl)methoxymethyl, benzyloxymethyl, 
p-methoxybenzyloxymethyl, p-nitrobenzyloxymethyl, o- 
nitrobenzyloxymethyl, (4-methoxyphenoxy)methyI, guaiacolmethyl, t- 
butoxymethyl, 4-pentenyloxymethyl, siloxymethyl, 2- 
methoxyethoxymethyl, 2,2,2,-trichloroethoxymethyt, bis(2- 

20 chloroethoxy methyl), 2-(trimethylsilyl)ethoxymethyl, menthoxymethyl, 
tetrahydropyranyl, 3-bromotetrahydropyranyl, tetrahydrothiopyranyl, 1- 
methoxycyclohexyl, 4-methoxytetrahydropyranyl, 4- 
methoxytetrahydrothiopyranyl, 4-methoxytetrahydrothiopyranyl S,S- 
dioxide, 1 -[(2-chloro-4-methyl)phenyl]-4-methoxypiperidin-4-yl, 1 -(2- 

25 fluorophenyl)-4-methoxypiperidin-4-yl, 1 ,4-dioxan-2-yl, tetrahydrofuranyl, 
tetrahydrothiofuranyl, 2,3 / 3a,4,5,6,7,7a-octahydro-7,8,8-trimethyl-4,7- 
methanobenzofuran-2-yl), substituted ethyl (1-ethoxyethyl, 1-(2- 
chloroethoxy)ethyl, 1 -[2-(trimethylsilyl)ethoxy]ethyl, 1 -methyl-1 - 
methoxyethyl, 1-methyl-1-benzyloxyethyl, 1 -methyl- 1 -benzyloxy-2- 
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fluoroethyl, 1-methyl-1-phenoxyethyl, 2,2,2-trichloroethyl, 1 ,1 -dianisyl- 
2,2,2-trichIoroethyI, 1,1,1 ,3,3,3-hexafluoro-2-phenylisopropyl, 2- 
trimethylsilylethyl, 2-<benzylthio)ethyl, 2-<phenylselenyl)ethyl), f-butyl, 
allyl, propargyl, p-chlorophenyl, p-methoxyphenyl, p-nitrophenyl, 2,4- 
5 dinitrophenyl, 2,3,5 / 6-tetrafluoro-4-(trifluoromethyl)phenyl / benzyl, 
substituted benzyl (p-methoxybenzyl, 3,4,-dimethoxybenzyl, o- 
nitrobenzyl, p-nitrobenzyl, p-halobenzyl, 2,6-dichlorobenzyl, p- 
phenylbenzyl, p-phenylenzyl, 2,6-difluorobenzyl, p-acylaminobenzyl, p- 
azidobenzyl, 4-azido-3-chlorobenzyl, 2-trifIuoromethylbenzyl, p- 

10 (methylsulfinyl)benzyl), 2- and 4-picolyl, 3-methyl-2-picolyl A/-oxido, 2- 

quinolinylmethyl, 1-pyrenylmethyl, diphenylmethyl, p,p '-dinitrobenzhydryl, 
5-dibenzosuberyl, triphenylmethyl, a-naphthyldiphenylmethyl, p- 
methoxyphenyldiphenylmethyl, di(p-methoxyphenyl)phenylmethyl, tri(p- 
methoxyphenyl)methyl, 4-(4-'-bromophenacyloxy)phenyldiphenylrnethyl / 

1 5 4,4',4"-tris(4, 5-dichlorophthalimidophenyl)methyl, 4,4', 4"- 

tris{levulinoyloxyphenyl)methyl, 4,4',4"-tris(benzoyloxyphenyl)methyl, 
4 / 4'-dimethoxy-3"-[/V~(imidazolylmethyl)]trityl / 4,4'-dimethoxy-3"-[/V- 
(imidazolylethyl)carbamoyl]trityl, 1,1-bis(4-methoxyphenyl-1 
pyrenyl methyl, 4-(1 7-tetrabenzo[a,c,#./lfiuorenylmethyl)-4,4 M - 

20 dimethoxytrityl, 9-Anthryl, 9-{9-phenyl)xanthenyl, 9-(9-phenyl-10- 

oxo)anthryl, 1 ,3-benzodithiolan-2-yl, benzisothiazolyl s,s-dioxido, silyl 
ethers (trimethylsilyl, triethylsilyl, triisopropylsilyl, dimethylisopropylsilyl, 
diethylisopropylsilyl, dimethylthexylsilyl, r-butyldimethylsilyl, f- 
butyldiphenylsilyl, tribenzylsilyl, tri-p-xylylsilyl, triphenylsilyl, 

25 diphenylmethylsilyl, di-f-butylmethylsilyl, tris(trimethylsilyl)silyl (sisyl), (2- 
hydroxystyryDdimethylsilyl, (2-hydroxystyryl)diisopropyIsilyl, f- 
butylmethoxyphenylsilyl, f-butoxydiphenylsilyl); 

(ii) esters such as formate, benzoylformate, acetate, substituted 
acetate (chloroacetate, dichloroacetate, trichloroacetate, trifluoroacetate, 
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methoxyacetate, triphenylmethoxyacetate, phenoxyacetate, p- 
chlorophenoxyacetate, phenylacetate, p-P-phenylacetate, 
diphenylacetate), nicotinate, 3-phenylpropionate, 4-pentenoate, 4- 
oxopentanoate (levulinate), 4,4-(ethylenedithio)pentanoate, 5-[3-bis(4- 
5 methoxyphenyl)hydroxymethylphenoxy]levulinate, pivaloate, 1- 
adamantoate, crotonate, 4-methoxycrotonate, benzoate, p- 
phenylbenzoate, 2,4,6-trimethylbenzoate (mesitoate), carbonates (methyl, 
methoxymethyl, 9-fluorenylmethyl, ethyf, 2, 2,2-trichloroethyl, 1,1,- 
dimethyl-2,2,2-trichloroethyl, 2-(trimethylsilyl)ethyl, 2- 

10 (phenylsulfonyl)ethyl, 2-(triphenylphosphonio)ethyl, isobutyl, vinyl, allyl, 
p-nitrophenyl, benzyl, p-methoxybenzyl, 3,4,-dimethoxybenzyl, o- 
nitrobenzyl, p-nitrobenzyl, 2-dansylethyl, 2-(4-nitrophenyi)ethyl, 2-(2,4- 
dinitrophenyl)ethyl, 2-cyano-1-phenylethyl, S-benzyl thiocarbonate, 4- 
ethoxy-1-naphthyl, methyl dithiocarbonate), 2-iodobenzoate, 4- 

15 azidobutyrate, 4-nitro-4-methylpentanoate, o-(dibromomethy I) benzoate, 2- 
formylbenzenesulfonate, 2-(methylthiomethoxy) ethyl carbonate, 4- 
(methylthiomethoxy)butyrate, 2-(methylthiomethoxymethyl)benzoate, 2- 
(chloroacetoxymethyl) benzoate, 2-[(2-chloroacetoxy)ethyl]benzoate, 2-[2- 
(benzyloxy)ethyl]benzbate, 2-[2-(4-methoxybenzyloxy)ethyl]benzoate, 

20 2,6-dichloro-4-methylphenoxyacetate, 2,6-dichloro-4-(1 ,1 ,3,3- 
tetramethylbutyl)phenoxyacetate # 2,4-bis{1,1- 

dimethylpropyDphenoxyacetate, chlorodiphenylacetate, isobutyrate, 
monosuccionoate, (£)-2-methyl-2-butenoate (tigloate), o- 
(methoxycarbonyl)benzoate, p-P-benzoate, or-naphthoate, nitrate, alkyl 
25 yV,A/,yV',A/'-tetramethylphosphorodiamidate, 2-chlorobenzoate, 4- 

bromobenzoate, 4-nitrobenzoate, 3'5'-dimethoxybenzoin, a wild and 
woolly photolabile fluorescent ester, A/-phenylcarbamate, borate, 
dimethylphosphinothioyl, 2,4-dinitrophenylsulfenate; and 

(iii) sulfonates (sulfate, allylsulfonate, methanesulfonate (mesylate), 
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benzylsulfonate, tosylate, 2-[(4-nitrophenyl)ethyl]sulfonate). 

Carboxyl protecting groups for use as X groups herein include, but 
are not limited to: 

(i) esters such as enzymatically cleavable esters {heptyl, 2-/V- 
5 (morpholino)ethyl, choline, (methoxyethoxy)ethyl, methoxyethyl), methyl, 
substituted methyl (9-fluorenylmethyl, methoxymethyl, methylthiomethyl, 
tetrahydropyranyl, tetrahydrofuranyl, methoxyethoxymethyl, 2- 
(trimethylsilyl)ethoxymethyl, benzyloxymethyl, pivaloyloxymethyl, 
phenylacetoxymethyl, triisopropylsilylmethyl, cyanomethyl, acetol, 

10 phenacyl, p-bromophenacyl, a-methylphenacyl, p-methoxyphenacyl, 
desyl, carboxamidomethyl, p-azobenzenecarboxamidomethyl, N- 
phthalimidomethyl), 2-substituted ethyl (2,2,2-trichloroethyl, 2-haloethyl, 
^-chloroalkyl, 2-(trimethylsilyl)ethyl, 2-methylthioethyl, 1 ,3-dithianyl-2- 
methyl, 2-{p-nitrophenylsulfenyl)ethyl, 2-(p-toluenesulfonyl)ethyl, 2-(2'- 

15 pyridyDethyl, 2-(p-methoxyphenyl)ethyl, 2-(diphenylphosphino)ethyl, 1- 
methyM-phenylethyl, 2-(4-acetyl-2-nitrophenyl)ethyl, 2-cyanoethyl), t- 
butyl, 3-methyl-3-pentyl, dicyclopropylmethyl, 2,4-dimethyI-3-pentyl, 
dicyclopropylmethyl, cyclopentyl, cyclohexyl, allyl, methaliyl, 2- 
methylbut-3-en-2-yl, 3-methylbut-2-(prenyl), 3-buten-1-yl, 4- 

20 (trimethylsilyl)-2-buten-1-yl, cinnamyl, a-methylcinnamyl, prop-2-ynyl 
(propargyl), phenyl, 2,6-dialkylphenyl (2,6,-dimethylphenyl, 
2,6,diisopropylpheny1, 2,6-di-f-butyl-4-methylphenyl, 2,6-di-f-butyl-4- 
methoxyphenyl, p-(methylthio)phenyI, pentafluorophenyl, benzyl, 
substituted benzyl (triphenylmethyl, diphenylmethyl, bis(o- 

25 nitrophenyDmethyl, 9-anthrylmethyl, 2-{9,10-dioxo)anthrylmethyl, 5- 

dibenzosuberyl, 1 -pyrenylmethyl, 2-(trifluoromethyl)-6-chromonylmethyl, 
2,4,6-trimethylbenzyl, p~bromobenzyl, o-nitrobenzyl, p-nitrobenzyl, p- 
methoxybenzyl, 2,6-dimethoxybenzyl, 4-(methylsulfinyl)benzyl, 4- 
sulfobenzyl, 4-azidomethoxybenzyl, 4-{A/-[1 -(4,4,-dimethy 1-2,6- 
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dioxocychlohexylidene)-3-methylbutyl]amino}benzyl, piperonyl, 4-picoIyl, 
p-P-benzyl), silyl (trimethylsilyl, triethylsilyl, f-butyldimethylsilyl, /- 
propyldimethylsilyl, phenyldimethylsilyl, di-f-butylmethy!silyl, 
triisopropylsilyl), activated (thiol), oxazoles, 2-alkyM ,3-oxazoline, 4-atkyl- 
5 5-oxo-l ,3-oxazolidine, 2, 2,-bistrifluoromethyl-4-alkyl-5-oxo-1 -,3- 
oxazolidine, 5-alkyl-4-oxo-1 ,3-dioxolane, dioxanones, ortho esters, Braun 
ortho ester, pentaaminocobalt(iii) complex, stannyl (triethylstannyl, tri-/V- 
butylstannyl); 

(ii) amides (A/,/V-dimethyl, pyrrolidinyl, piperidinyl, 5,6- 

10 dihydrophenanthridinyl, o-nitroaniiide, /V-7-nitroindolyl, /V-8-nitro-1 ,2,3,4- 
tetrahydroquinoiyl, 2-(2-aminophenyl)acetaldehyde dimethyl acetal amide, 
p-P-benzenesulfonamide; 

(iii) hydrazides (/V-phenyl, yV,AT-diisopropyl); and 

(iv) tetraaikylammonium salts. 

15 Thiol protecting groups for use as X groups herein include, but are 

not limited to: 

(i) thioethers (S-alkyl, S-benzyl, S-p-methoxybenzyl, S-o- or p- 
hydroxy- or acetoxy benzyl, S-p-nitrobenzyl, S-2,4,6-trimethyIbenzyl, S- 
2,4,6-trimethoxybenzyl, S-4-picolyl, S-2-quinolinylmethyl, S-2-picolyl /V- 

20 oxido, S-9-anthrylmethyl, S-9-fluorenylmethyl, S-xanthenyl, S- 

ferrocenylmethyl); S-diphenylmethyl, substituted S-diphenylmethyl and S- 
triphenylmethyl {S-diphenylmethyl, S-bis(4-methoxyphenyl)methyl, S-5- 
dibenzosuberyl, S-triphenylmethyl, S-diphenyl-4-pyridyImethyl), S-phenyl, 
S-2,4-dinitrophenyl, S-f-butyl, S-1-adamantyl, substituted S-methyl 

25 including mo.nothio, dithio and aminothioacetals (S-methoxymethyl, S- 
isobutoxymethyl, S-benzytoxymethyl, S-2-tetrahydropyranyl, S- 
benzylthiomethyl, S-phenylthiomethyl, thiazolidine, S-acetamidomethyl, S- 
trimethylacetomidomethyl, S-benzamidomethyl, S- 
allyloxycarbonylaminomethyl, S-phenylacetamidomethyl, S- 
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phthalimidomethyl, S-acetyh S-carboxyK and S-cyanomethyl), 
substituted S-ethyl (S-(2-nitro-1-phenyl)ethyl, S-2-(2,4-dintrophenyl)ethyl, 
S-2-(4'-pyridyl)ethyl, S-2-cyanoethyl, S-2-(trimethylsilyl)ethyl, S-{1-m- 
nitrophenyl-2-benzoyI)ethyl, S-2-phenylsutfonylethyl, S-1-{4- 
5 methylphenylsulfonyl)-2-methylprop-2-yl, silyl; 

(ii) thioesters (S-acetyl, S-benzoyI, S-trifluoroacetyl, S-/V-[[(p- 
biphenylyl)isopropoxy]carbonyl]-A/-methyl-K-anninothiobutyrate / S-/V-(f- 

butoxycarbonyl-/V-methyl-K-aminothiobutyrate), thiocarbonates (S-2,2,2- 

trichloroethoxycarbonyl, S-f-butoxycarbonyl, S-benzyloxycarbonyl, S-p- 

10 methoxybenzyloxycarbonyl), thiocarbamates (S-(/V-ethyl), S-(N- 

methoxy methyl)); 

(iii) unsymmetrical disulfides (S-ethyl, S-f-butyl, substituted S- 

phenyl disulfides); 

(iv) sulfenyl derivatives (S-sulfonate, S-sulfenylthiocarbonate, S-3- 
15 nitro-2-pyridinesulfenyl sulfide, S-[tricarbonyl[1 ,2,3,4,5-/7l-2-,4- 

cyclohexadien-1-yl]-iron(1 +), oxathiolone); and 

(v) S-methylsulfonium salt, S-benzyl- and S-4- 
methoxybenzylsulfonium salt, S-1-(4-phthalimidobutyl)sulfonium salt, S- 
(dimethylphosphinol)thioyl, S-(diphenylphosphino)thioyl. 

20 Amino protecting groups for use as X groups herein include, but 

are not limited to: 

(i) carbamates (methyl, ethyl, 9-fluorenylmethyl, 9-(2- 
sulfo)fluorenylmethyl, 9-{2,7-dibromo)fluorenylmethyl, 17- 
tetrabenzo[a,c,#./lfluorenylmethyl, 2-Chloro-3-indenylmethyl, 
25 benz[flinden-3-ylmethyi, 2, 7-di-f-butyl-[9-(1 0, 1 0-dioxo- 1 0, 1 0, 1 0, 1 0- 

tetrahydrothiox, 1 , 1 -dioxobenzo[/>]thiophene-2-ylmethyl, substituted ethyl 
(2,2,2-trichloroethyl, 2-trimethylsilylethyl, 2-phenylethyl, 1-{1-adamantyl)- 
1-methylethyl, 2-chloroethyi, 1 ,1 -dimethyl-2-haloethyl, 1 , 1-dimethyl-2,2- 
dibromoethyl, 1 , 1 -dimethyl-2,2,2-trichloroethyl, 1 -methyl- 1 -(4- 
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biphenylyl)ethyl, 1-(3,5<li^butylphenyl)-1-methylethyl, 2-(2'~ and 4'- 
pyridyl)ethyi, 2,2-613(4' -nitrophenyl)ethyl, A/-{2-pivaloylamino)-1 ,1 - 
dimethylethyl, 2-[(2-nitrophenyl)dithio]-1-phenylethyl, 2-(/V,/V- 
dicyclohexylcarboxamido)ethyl), f-butyl, 1-adamantyl, 2-adamantyI, vinyl, 
5 aliyl, 1-isopropyIallyl, cinnamyl, 4-nitrocinnamyl, 3-(3'pyridyl)prop-2-enyl / 
8-quinolyl, /V-hydroxypiperidinyl, alkyldithio, benzyl, p-methoxybenzyl, p- 
nitrobenzyl, p-bromobenzyl, p-chlorobenzyl, 2,4-dichlorobenzyl, 4- 
methylsulfinylbenzyl, 9-anthrylmethyl, diphenylmethyl, 2-methylthioethyl, 
2-methylsulfonylethyl, 2-(p-toluenesuIfonyl)ethyI, [2-(1 ,3-dithianyl)methyl, 

10 4-methylthiophenyl, 2,4-dimethylthiophenyl, 2-phosphonioethyl, 1- 
methyl-1-(triphenylphsophonio)ethyl, 1 ,1-dimethyl-2-cyanoethyl, 2- 
dansylethyl, 2-(4-nitrophenyl)ethyl, 4-phenyIacetoxybenzyl, 4-azidobenzyl, 
4-azidomethoxybenzyl, A??-chloro-p-acyloxybenzyl, p- 
(dihydroxyboryl)benzyl, 5-benzisoxazolylmethyl, 2-{trifluoromethyl)-6- 

15 chromonylmethyl, m-nitrophenyl, 3, 5-dimethoxy benzyl, 1-methyl-1-{3,5- 
dimethoxyphenyl)ethyl, a-methylnitropiperonyl, o-nitrobenzyl, 3,4- 
dimethoxy-6-nitrobenzyl, phenyl(o-nitrophenyl)methyl, 2-(2- 
nitrophenyl)ethyl, 6-nitroveratryl, 4-methoxyphenacyl, 3',5'- 
dimethoxybenzoin, ureas (phenothiazinyl-(IO)-carbonyl derivative, /V'-p- 

20 toluenesulfonylaminocarbonyl, /V'-phenylaminothiocarbonyl), f-amyl, S- 
benzyl thiocarbamate, butynyl, p-cyanobenzyl, cyclobutyl, cyclohexyl, 
cyclopentyl, cyclopropylmethyl, p-decyloxybenzyl, diisopropylmethyl, 2,2- 
dimethoxycarbonylvinyl, o-(/V'-/V'-dimethylcarboxamido)benzyl, 1,1- 
dimethyl-3-(/V^/V'-dimethylcarboxamido)propyl, 1 , 1 -dimethylpropynyl, 

25 di(2-pyridyl)methyl), 2-furanyImethyl, 2-lodoethyI, isobornyl, isobutyl, 
isonicotinyl, p-(p '-methoxyphenylazo)benzyl, 1-methylcyclobutyl, 1- 
methylcyclohexyl, 1 -methyl- 1 -cyclopropylmethyl, 1 -methyl- 1-(p- 
phenylazophenyl)ethyl, 1 -methyl- 1-phenylethyl, 1-methyl-1-(4'- 
pyridyDethyl, phenyl, p-(phenylazo)benzyl, 2,4,6-tri-f-butyIphenyl, 4- 
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(trimethylammonium)benzyl, 2,4,6-trimethylbenzyl); 

<ii) amides (/V-formyl, /V-acetyl, A/-chloroacetyl, N- 
tricholoroacetyl, /V-trifluoroacetyl, /V-phenylacetyl, /V-3-phenylpropionyl, N- 
4-pentenoyl, /V-picolinoyl, n-3-pyridylcarboxamido, /V-benzoylphenylalanyl 
5 derivative, /V-benzoyl, /V-p-phenylbenzoyl, /V-o-nitrophenylacetyl, N-o- 
nitrophenoxyacetyl, /V-3-(o-nitrophenyl)propionyl, /V-2-methyl-2-(o- 
nitrophenoxy)propionyl, /V-3-methyl-3-nitrobutyryl, /V-o-nitrocinnamoyl, N- 
o-nitrobenzoyl, /V-3-(4-f-butyl-2,6-dinitrophenyl-2,2-dimethylpropionyl, N- 
o-(benzoyloxymethyl)benzoyl, /V-(2-acetoxymethyl)benzoyl, N-2-[(t- 
10 butyldiphenylsiloxy)methyl]benzoyl,/V-3-(3',6'-dioxo-2',4',5'- 

trimethylcyclohexa-T,4'-diene)-3 / 3-dimethylpropionyl,/V-o-hydroxy-Oa/7s- 

cinnamoyl, /V-2-methyl-2-(o-phenylazophenoxy)propionyl, A/-4- 
chlorobutyryl, /V-acetoacetyl, /V-3-(p-hydroxyphenyl)propionyl, (N'- 
dithiobenzyloxycarbonylamino)acetyl, /V-acetylmethionine derivative, 4,5- 
15 diphenyl-3-oxazolin-2-one), cyclic imides (/V-phthaloyl, N- 

tetrachlorophthaloyl, W-4-nitrophthaloyl, /V-dithiasuccinoyl, /V-2,3- 
diphenylmaleoyl, A/-2,5-dimethylpyrrolyl, /V-2,5- 

bis(triisopropylsiloxy)pyrrolyl, /V-1 ,1 ,4,4-tetramethyldisilylazacyclopentane 
adduct, /V-1,1,3,3-tetramethyl-1,3-disilaisoindolyl, 5-substituted 1,3- 
20 dimethyl-1,3,5-triazacyclohexan-2-one, 5-substituted 1 ,3-dibenzyM ,3,5- 
triazacyclohexan-2-one, 1 -substituted 3,5-dinitro-4-pyridonyl, 1,3,5- 
dioxazinyl); 

(iii) /V-alkyl and A/-aryl amines (/V-methyl, A/-f-butyl, /V-allyl, N-[2- 
(trimethylsilyDethoxylmethyl, A/-3-acetoxypropyl, /V-cyanomethyl, AM1- 
25 isopropyl-4-nitro-2-oxo-3-pyrrolin-3-yl), /V-2,4-dimethoxy benzyl, A/-2- 
azanorbornenyl, /V-2,4-dinitrophenyl, quaternary ammonium salts, N- 
benzyl, /V-4-methoxybenzyl, /V-2,4-dimethoxybenzyl, /V-2-hydroxybenzyl, 
/V-diphenytmethyl, /V-bis(4-methoxyphenyl)methyl, /V-5-dibenzosuberyl, N- 
triphenylmethyl, /V-(4-methoxyphenyl)diphenylmethyl, /V-9- 
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phenylfluorenyl, A/-ferrocenylmethyl, A/-2-picolylamine AT-oxide); 

(iv) imines (AM ,1-dimethylthiomethylene, A/-benzylidine, N-p- 
methoxybenzyiidene, AAdiphenylmethylene, A/-[(2- 

pyridyl)mesityl]methylene, AMA/^A/'-dimethylaminomethylene), A/-(A/',A/'- 
5 dibenzylaminomethylene), AMAT^butylaminomethylene), A/,AT- 
isopropylidene, A/-p-nitrobenzylidene, A/-salicyIidene, A/-5- 
chlorosalicylidene, AM5-chloro-2-hydroxyphenyl)phenylmethylene, N- 

cyclohexylidene, A/-f-butylidene); 

(v) enamines (AM5,5-dimethyl-3-oxo-1-cyciohexenyl, A/-2,7- 

10 dichloro-9-fluorenylmethylene, n-2-(4,4-dimethyl-2,6- 

dioxocyclohexyltdene)ethyl, A/-4,4,4-trif Iuoro-3-oxo-1 -buteryl, AM - 
isopropyl-4-nitro-2-oxo-3-pyrrolin-3-yl); 

(vi) A/-heteroatom derivatives (/V-borane derivatives, N- 
diphenylborinic acid derivative, /V-diethylborinic acid derivative, A/- 

15 difluoroborinic acid derivative, A/,A/'-3,5-bis(trifiuoromethyl)phenylboronic 
acid derivative, AMphenyKpentacarbonylchromium- or -tungsten)]carbenyl, 
A/-copper or W-zinc chelate, 18-crown-6 derivative, A/-nitro, A/-nitroso, A/- 
oxide, triazene derivative, A/-diphenylphosphinyl, A/-dimethyl~ and 
diphenylthiophosphinyl, A/-dialkyl phosphoryl, A/-dibenzyl and diphenyl 

20 phosphoryl, iminotriphenylphosphorane derivative, A/-benzenesulfenyl, N- 
o-nitrobenzenesulfenyl, A/-2,4~dinitrobenzenesulfenyl, N- 
pentachlorobenzenesulfenyl, A/-2-nitro-4-methoxybenzensulfenyI, N- 
triphenylmethylsulfenyl, A/-1-(2,2,2-trifluoro-1,1-dipheny!)ethylsu!fenyl, A/- 
3-nitro-2-pyridinesulfenyl,A/-p-toluenesulfonyl, AAbenzenesulfonyl, A/-2,3- 

25 6-trimethyl-4-methoxybenzenesulfonyl, A/-2,4,6-trimethoxybenzesulfonyl, 
A/-2,6-dimethyl-4-methoxybenzenesulfonyl, N- 
pentamethylbenzenelsulfonyl, A/-2,3,5,6-tetramethyl-4- 
methoxybenzenesulfonyl, A/-4-methoxybenzenesulfonyI, W-2,4,6- 
trimethylbenzenesulfonyl, A/-2,6-dimethoxy-4-methylbenzenesuifonyl, A/- 
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3-methoxy-4-f-butylbenzenesulfonyl, A/-2,2,5,7,8-pentamethyIchroman-6- 
sulfonyl, 7V-2- and 4-nitrobenzenesulfonyI, /V-2,4-dinitrobenzenesulfonyl, 
W-benzothiazole-2-sulfonyl, A/-pyridine-2-sulfonyl, A/-methanesuIfonyl f N- 
2-{trimethylsiIyI)ethanesuifonyl, A/-9-anthracenesulfonyl, /V-4-(4',8'- 
5 dimethoxynaphthylmethyl)benzenesulfonyl, /V-benzylsulfonyl, N- 
trifluoromethylsulfonyl, /V-phenacylsulfonyi, /V-t-butylsulfonyl); 

(vii) imidazole protecting groups including A/-sulfonyl derivatives 
(A/,/V-dimethylsulfonyl, A/-mesitylenesulfonyl, /V-p-methoxyphenylsulfonyl, 
/V-benzenesulfonyl, A/-p-toluenesulfonyl); carbamates (2,2,2-trichloroethyl, 

10 2-(trimethylsilyl)ethyl, f-butyl, 2,4-dimethylpent-3-yI, cyclohexyl, 1,1- 

dimethyl-2,2,2-trichloroethyi, 1-adamantyl, 2-adamantyl); /V-alkyI and N- 
aryl derivatives {/V-vinyl, A/-2-chloroethyl, /V-(1-ethoxy)ethyl, /V-2-(2'- 
pyridyDethyl, A/-2-(4'-pyridyl)ethyl, /V-2-(4'-nitrophenyl)ethyl), /V-trialkyl 
silyl derivatives (W-f-butyldimethylsilyl, /V-triisopropylsilyl), /V-aliyl, N- 

15 benzyl, /V-p-methoxybenzyl, A/-3,4-dimethoxybenzyl, AA-3-methoxybenzyl, 
/V-3,5~dimethoxy benzyl, /V-2-nitrobenzyl, A/-4-nitrobenzyl, /V-2,4- 
dinitrophenyi, /V-pyhenacyl, AMriphenylmethyl, /V-diphenylmethyl, /V- 
(diphenyl-4-pyridylmethyl), AM" V?'^imethylamino)), amino acetal 
derivatives (A/-hydroxymethyl, A/-methoxy methyl, /V-diethoxy methyl, N- 

20 ethoxymethyl, A/-(2-chloroethoxy)methyl, A/-[2- 

(trimethylsi!yl)ethoxy]methyl, A/-f-butoxymethyl, N-t- 
butyldimethylsiloxymethyl, /V-pivaloyloxymethyl, /V-benzyloxymethyl, N- 
dimethylaminomethyl, A/-2-tetrahydropyranyl), amides (carbon dioxide 
adduct, W-formyl, yV-(/7',/7'-diethylureidyl), A/-dichloroacetyl, /V-pivaloyl, N- 

25 diphenylthiophosphinyl); and 

(viii) amide -NH protecting groups including amides (/V-allyl, N-t- 
butyl, /V-dicyclopropylmethyl, A/-methoxymethyl, yV-methylthiomethyl, AA- 
benzyloxymethyl, A/-2,2,2-trichloroethoxymethyl, N-t- 
butyldimethylsiloxymethyl, /V-pivaloyloxymethyl, /V-cyanomethyl, N~ 
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pyrrolidinomethyl, A/-methoxy, A/-benzyloxy, /V-methylthio, /V- 
triphenylmethylthio, A/-f-butyldimethyisilyl, /V-triisopropylsilyl, N-4~ 
methoxyphenyl, /V-3,4-dimethoxyphenyl, /V-4-(methoxymethoxy) phenyl, 
/V-2-methoxy-1-naphthyl, /V-benzyl, A/-4-methoxybenzyl, /V-2,4- 
5 dimethoxy benzyl, /V-3,4-dimethoxybenzyl, /V-o-nitrobenzyl, A/-bis(4- 
methoxyphenyl)methyl, /V-bis(4-methoxyphenyl)phenylmethyl, /V-bis(4- 
methylsulfinylphenyl)methyl, AZ-triphenylmethyl, /V-9-phenylfluorenyl, N- 
bis(trimethylsilyl)methyl, A/-f-butoxycarbonyl, A/-benzyloxycarbonyl, /V- 
methoxycarbonyl, A/-ethoxycarbonyl,/V-p-toIuenesulfonyl, N f O- 

10 isopropylidene ketal, /V r O-benzylidene acetal, A/,0-formylidene acetal, N- 
butenyl, /V-ethenyl, /V-[(e)-(2-methoxycarbonyl)vinyl], A/-diethoxymethyl, 
/V-(1-methoxy-2,2-dimethylpropyl), /V-2-(4-methylphenyIsulfonyl)ethyl). 

These protecting groups react with amino acid side chains such as 
hydroxyl (serine, threonine, tyrosine); amino (lysine, arginine, histadine, 

15 proline); amide (glutamine, asparagine); carboxylic acid (aspartic acid, 
glutamic acid); and sulfur derivatives (cysteine, methionine), and are 
readily adaptable for use in the capture compounds as the reactive moiety 
X. 

It is in addition to the wide range of group-specific reagents that 
20 are known to persons of skill in the art, reagents that are known in 
natural product chemistry also can serve as a basis for X in forming 
covalent linkages. Other choices for, X include protein purification 

dyes, such as acridine or methylene blue, which have a strong affinity for 
certain proteins. 

25 Alternatively, X can act as an electron donor or an electron 

acceptor to form non-covalent bonds or a complex, such as a charge- 
transfer complex, with a biomolecule, including, but not limited to, a 
protein, such that the resulting bond has a high stability {i.e., stable under 
conditions of mass spectrometric analysis, such as MALDI-TOF, as 
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defined above). These reagents include those that interact strongly and 
with high specificity with biomolecules, including, but not limited to, 
proteins, without forming covalent bonds through the interaction of 
complementary affinity surfaces. For example, well known one of 
5 binding pair, such as biotin or streptavidin, antibody or antigen, receptor 
or ligand, lectin or carbohydrate and other similar types of reagents, are 
readily adaptable for use in these compounds as the reactive moiety X 
that will react with high affinity to biomolecules with surfaces similar to 
or identical to the other member of the binding pair. These moieties are 
10 selected so that the resulting conjugates (also referred to herein as 

complexes) have strong interactions that are sufficiently stable enough 
for suitable washing of the unbound biomolecules, including, but not 

i 

limited to, proteins, out of the complexed biological mixtures. 

. The reactivity of X can be influenced by one or more selectivity 

15 functions Y on the core, i.e., M in the formula above, particularly where 
S 2 is not present. 

The Y function, discussed below is employed for electronic (e.g., 
mesomeric, inductive) and/or steric effects to modulate the reactivity of X 
and the stability of the resulting X-biomolecule linkage. In these 

20 embodiments, biomolecule mixtures, including, but not limited to, protein 
mixtures, can react and be analyzed due to the modulation by Y, which 
changes the electronic or steric properties of X and, therefore, increases 
the selectivity of the reaction of X with the biomolecule. 

In certain embodiments, X is an active ester, such as 

25 -C( = 0)0-Ph-pN0 2 , -C( = 0)0-C 6 F 5 or -C( = O)-O-(N-succinimidyl); an active 
halo moiety, such as an cr-halo ether or an or-halo carbonyl group, 
including, but not limited to, -OCH 2 -l, -OCH 2 -Br, -OCH 2 -CI, -C(0)CH 2 l, 
-C(0)CH 2 Br and -C(0)CH 2 CI; amino acid side chain-specific functional 
groups, such as maleimido (for cysteine), a metal complex, including gold 



» 
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or mercury complexes (for cysteine or methionine), an expoxide or 
isothiocyanate {for arginine or lysine); reagents that bind to active sites of 
enzymes, including, but not limited to, transition state analogs; 
antibodies, e.g., against phosphorylated peptides; antigens, such as a 
5 phage display library; haptens; biotin; avidin; or streptavidin. 

3. Selectivity Functions "Y" 

The selectivity functions ("Y") serves to modulate the reactivity 
function by reducing the number of groups to which the reactivity 
functions binds, such as by steric hindrance and other interactions. It is a 

10 group that modifies the steric and/or electronic (e.g., mesomeric, 

inductive effects) properties as well as the resulting affinities properties 
of the capture compound. Selectivity functions include any functional 
groups that increase the selectivity of the reactivity group so that it binds 
to fewer different biomolecules than in the absence of the selectivity 

15 function or binds with greater affinity to biolmolecules than in its 

absence. In the capture compounds provided herein, Y is allowed to be 
extensively varied depending on the goal to be achieved regarding steric 
hindrance and electronic factors as they relate to modulating the 
reactivity of the cleavable bond L, if present, and the reactive 

20 functionality X. For example, a reactivity function X can be selected to 
bind to amine groups on proteins; the selectivity function can be selected 
to ensure that only groups exposed on the surface can be accessed. The 
selectivity function is such that the compounds bind to or react with (via 
the reactivity function) fewer different biomolecules when it is part of the 

25 molecule than when it is absent and/or the compounds bind with greater 
specificity and higher affinity The selectivity function can be attached 
directly to a compounds or can be attached via a linker, such as CH 2 C0 2 
or CH 2 -0-(CH 2 ) n -0, where n is an integer from 1 to 1 2, or 1 to 6, or 2 to 
4. See, e.g., Figure 17 and the discussion below for exemplary selectivity 
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f unctions. 

In certain embodiments, each Y is independently a group that 
modifies the affinity properites and/or steric and/or electronic (e.g., 
mesomeric, inductive effects) properties of the resulting capture 
5 compound. For example, Y, in certain embodiments, is selected from 
ATP analogs and inhibitors; peptides and peptide analogs; 
polyethyleneglycol (PEG); activated esters of amino acids, isolated or 
within a peptide; cytochrome C; and hydrophilic trityl groups. 

In another embodiment, Y is a small molecule moiety, a natural 

10 product, a protein agonist or antagonist, a peptide or an antibody (see, 
e.g., Figure 17). In another embodiment, Y is a hydrophilic compound or 
protein {e.g., PEG or trityl ether), a hydrophobic compound or protein 
(e.g., polar aromatics, lipids, glycolipids, phosphotriesters, 
oligosaccharides), a positive or negatively charged group, a small 

15 molecule, a pharmaceutical compound or a biomolecule that creates 
defined secondary or tertiary structures. 

In other embodiments, Y is a group that is a component of a 
luminescent, including fluorescent, phosphorescent, chemiluminescent 
and bioluminescent system, or is a group that can be detected in a 

20 colorimetric assay; in certain embodiments, Y is a monovalent group 

selected from straight or branched chain alkyl, straight or branched chain 
alkenyl, straight or branched chain alkynyl, cycloalkyl, cycloalkenyl, 
cycloalkynyl, heterocyclyl, straight or branched chain heterocyclylalkyl, 
straight or branched chain heterocyclylalkenyl, straight or branched chain 

25 heterocyclylalkynyl, aryl, straight or branched chain arylalkyl, straight or 
branched chain arylalkenyl, straight or branched chain arylalkynyl, 
heteroaryl, straight or branched chain heteroarylalkyl, straight or branched 
chain heteroarylalkenyl, straight or branched chain heteroarylalkynyl, halo, 
straight or branched chain haloalkyl, pseudohalo, azido, cyano, nitro, 
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OR 50 , NR 60 R 61 , COOR 60 , CIOIR 60 , C(O)NR 60 R 61 , S(0) q R 60 , S(O) q OR e0 , 
S(0) q NR 60 R 61 , NR 60 C(0)R 61 f NR 60 C(O)NR 60 R 61 , NR 60 S(O) q R 60 , SiR 60 R 61 R 62 , 
P(R 60 ) 2 , P(O)(R 60 ) 2 , P(OR 60 ) 2f P(0)(OR 60 ) 2 , PtOXOR^XR 61 ) and P{O)NR 60 R 61 , 
where q is an integer from O to 2; 
5 each R 60 , R 61 , and R6 2 is independently hydrogen, straight or 

branched chain alkyl, straight or branched chain alkenyl, straight or 
branched chain alkynyl, aryl, straight or branched chain aralkyl, straight or 
branched chain aralkenyl, straight or branched chain aralkynyl, heteroaryl, 
straight or branched chain heteroaralkyl, straight or branched chain 

10 heteroaralkenyl, straight or branched chain heteroaralkynyl, heterocyclyl, 
straight or branched chain heterocyclylalkyl, straight or branched chain 
heterocyclylalkenyl or straight or branched chain heteorcyclylalkynyl. 

Fluorescent, colorimetric and phosphorescent groups are known to 
those of skill in the art (see, e.g., U.S. Patent No. 6,274,337; Sapan et 

15 al. (1999) Biotechnol. Appl. Biochem. 29 (Pt. 2;:99-108; Sittampalam et 
al. (1997) Curr. Opin. Chem. Biol. 7^:384-91; Lakowicz, J. R., 
Principles of Fluorescence Spectroscopy, New York: Plenum Press 
(1983); Herman, B., Resonance Energy Transfer Microscopy, in: 
Fluorescence Microscopy of Living Cells in Culture, Part B, Methods in 

20 Cell Biology, vol. 30, ed. Taylor, D. L. & Wang, Y. -L., San Diego: 

Academic Press (1989), pp. 219-243; Turro, N. J., Modern Molecular 
Photochemistry, Menlo Park: Benjamin/Cummings Publishing Col, Inc. 
(1978), pp. 296-361 and the Molecular Probes Catalog (1997), OR, 
USA). Fluorescent moieties include, but are not limited to, 1- and 2- 

25 aminonaphthalene, p^'-diaminostilbenes, pyrenes, quaternary 

phenanthridine salts, 9-aminoacridines, p,p'-diaminobenzophenone imines, 
anthracenes, oxacarbocyanine, merocyanine, 3-aminoequilenin, perylene, 
bis-benzoxazole, bis-p-oxazolyl benzene, 1,2-benzophenazin, retinol, bis- 
3-aminopyridinium salts, hellebrigenin, tetracycline, sterophenol, 
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benzimidazolylphenylamine, 2-oxo-3-chromen, indole, xanthen, 7- 
hydroxycoumarin, phenoxazine, calicylate, strophanthidin, porphyrins, 
triarylmethanes and flavin. Fluorescent compounds that have 
functionalities for linking to a compound provided herein, or that can be 
5 modified to incorporate such functionalities include, e.g., dansyl chloride; 
fluoresceins such as 3,6-dihydroxy-9-phenylxanthhydrol; rhodamineiso- 
thiocyanate; N-phenyl 1-amino-8-sulfonatonaphthalene; N-phenyl 2- 
amino-6-sulfonatonaphthalene; 4-acetamido-4-isothiocyanato-stilbene- 
2,2'-disulfonic acid; pyrene-3-sulfonic acid; 2-toluidinonaphthalene-6- 

10 sulfonate; N-phenyl-N-methyI-2-aminoaphtha!ene-6-suIfonate; ethidium 
bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl 
phosphatidylethanolamine; N,N'-dioctadecyl oxacarbocyanine: N,N'- 
dihexyl oxacarbocyanine; merocyanine, 4-(3'pyrenyl)stearate; d-3- 
aminodesoxy-equilenin; 1 2-(9'-anthroyl)stearate; 2-methylanthracene; 9- 

15 vinylanthracene; 2,2'(vinyIene-p-phenylene)bisbenzoxazole; p-bis(2-(4- 
methyl-5-phenyI-oxazolyl))benzene; 6-dimethylamino-1 ,2-benzophenazin; 
retinol; bis(3'-aminopyridinium) 1 ,1 0-decandiyl diiodide; 
sulfonaphthylhydrazone of hellibrienin; chlorotetracycline; N-(7- 
dimethylamino4-methyl-2-oxo-3-chromenyl)maleimide; N-(p-(2- 

20 benzimidazolyl)-phenyl)maleimide; N-(4-fluoranthyl)maIeimide; 

bis(homovanillic acid); resazarin; 4-chloro-7-nitro-2, 1 ,3-benzooxadiazole; 
merocyanine 540; resorufin; rose bengal; and 2,4-diphenyl-3(2H)- 
furanone. Many fluorescent tags are commercially available from SIGMA 
chemical company (Saint Louis, Mo.), Molecular Probes, R&D systems 

25 (Minneapolis, Minn.), Pharmacia LKB Biotechnology. (Piscataway, N.J.), 
CLONTECH Laboratories, Inc. (Palo Alto, Calif.), Chem Genes Corp., 
Aldrich Chemical Company (Milwaukee, Wis.), Glen Research, Inc., 
GIBCO BRL Life Technologies, Inc. (Gaithersberg, Md.), Fluka Chemica- 
Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), and 
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Applied Biosystems (Foster City, Calif.) as well as other commercial 
sources known to one of skill in the art. 

Chemiluminescent groups intended for use herein include any 
components of light generating systems that are catalyzed by a 
5 peroxidase and require superoxide anion (0 2 ) (and/or hydrogen peroxide 
(H 2 0 2 ))(see, e.g., Musiani et at. (1998) Histol. Histopathol. 73(1):243-8). 
Light-generating systems include, but are not limited to, luminol, 
isoluminol, peroxyoxalate-fluorophore, acridinium ester, lucigenin, 
dioxetanes, oxalate esters, acridan, hemin, indoxyl esters including 3-0- 

10 indoxyl esters, naphthalene derivatives, such as 7-dimethylamino- 

naphthalene-1,2-dicarbonic acid hydrazide and cypridina luciferin analogs, 
including 2-methyl-6-[p-methoxyphenyl]-3,7-dihyroimidazo[1 ,2-a]pyrazin- 
3-one, 2-methyI-6-phenyl-3,7-dihyroimidazo[1 ,2-a]pyrazin-3-one and 2- 
methy!-6-[p-[2-[sodium 3-carboxylato-4-(6-hydroxy-3-xanthenon-9- 

1 5 yl]phenylthioureylene]ethyleneoxy]phenyl]-3,7-dihyroimidazo[1 ,2- 

ar]pyrazin-3-one. In other embodiments, the chemiluminescent moieties 
intended for use herein include, but are not limited to, luminol, isoluminol, 
N-(4-aminobutyl)-N-ethyl isoluminol (ABEI), N-(4-aminobutyl)-N-methyl 
isoluminol (ABMI), which have the following structures and participate in 

20 the following reactions: 



R O R O 




where luminol is represented, when R is NH 2 and R 1 is H; isoluminol, 
when R is H and R 1 is NH 2 ; for ABEI ((6-[N-(4-aminobutyl)-N-ethylamino]- 
2,3~dihyrophthalazine-1-4-dione), when R is H and R 1 is C 2 H 5 -N- 
(CH 2 ) 4 IMH 2 ; and for ABMI ((6-[N-{4-aminobutyl)-N-methylamino]-2,3- 
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dihyrophthalazine-1-4-dione), when R is H and R 1 is CH 3 -N-(CH 2 ) 4 NH 2 . 

Bioluminescent groups for use herein include luciferase/luciferin 
couples, including firefly [Photinus pyra/is] luciferase, the Aequorin 
system (i.e. , the purified jellyfish photoprotein, aequorin). Many 
5 luciferases and substrates have been studied and well-characterized and 
are commercially available (e.g., firefly luciferase is available from Sigma, 
St. Louis, MO, and Boehringer Mannheim Biochemicals, Indianapolis, IN; 
recombinantly produced firefly luciferase and other reagents based on this 
gene or for use with this protein are available from Promega Corporation, 

10 Madison, Wl; the aequorin photoprotein luciferase from jellyfish and 

luciferase from Renilla are commercially available from Sealite Sciences, 
Bogart, GA; coelenterazine, the naturally-occurring substrate for these 
luciferases, is available from Molecular Probes, Eugene, OR]. Other 
bioluminescent systems include crustacean, such as Cyrpidina (Varguia), 

15 systems; insect bioluminescence generating systems including fireflies, 
click beetles, and other insect systems; bacterial systems; dinoflagellate 
bioluminescence generating systems; systems from molluscs, such as 
Lat/a and Pholas; earthworms and other annelids; glow worms; marine 
polycheate worm systems; South American railway beetle; fish (i.e., 

20 those found in species of Aristostomias, such as A. scintillans (see, e.g., 
Q'Dayetaf. (1974) Vision Res. 74:545-550), Pachystomias, and 
Malacosteus, such as M. niger; blue/green emmitters include cyclthone, 
myctophids, hatchet fish (agyropelecus), vinciguerria, howella, 
florenciella, and Chauliodus); and fluorescent proteins, including green 

25 (i.e., GFPs, including those from Renilla and from Ptilosarcus) , red and 
blue (i.e., BFPs, including those from Vibrio fischeri, Vibrio harveyi or 
Photobacterium phosphoreum) fluorescent proteins (including Renilla 
mulleri luciferase, Gaussia species luciferase and Pleuromamma species 
luciferase) and phycobiliproteins. 
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Examplary selectivity functions include, but are not limited to, 
ligands that bind to receptors such as insulin and other receptors (see, 
e.g., the Table of ligands below); cyclodextrtrins; enzyme substrates; lipid 
structures; prostaglandins; antibiotics; steroids; therapeutic drugs; 
enzyme inhibitors; transition state analogs; specific peptides that bind to 
biomolecule surfaces, including glue peptides; lectins {e.g., mannose 
type, lactose type); peptide mimetics; statins; functionalities, such as 
dyes and other compounds and moieties employed for protein purification 
and affinity chromatraphy. See e.g., Figure 17, and the following table of 



Exemplary peptide ligands 


Designation 


Sequence 


SEQ ID 


Adrenocorticotropic 
hormone 


SYSMEHFRWG KPVGKKRRPV 
KVYPNGAEDE SAEAFPLEF 


1 


Adrenomedullin 


YRQSMNNFQG LRSFGCRFGT 
CTVQKLAHQI YQFTDKDKDN VAPRSKISPQ 
GY 


2 


Allatostatin l-IV 


APSGAQRLYGFGL 


3 


alpha MSH 


WGKPV(ac)SYSMEHFR 


4 


alpha-Bag Cell Peptide 


APRERFYSE 


5 


alpha-Neo-endorphin 


YGGFLRKYPK 


6 


Alytesin 


E * G RLGTQW A V GHLM-NH 2 


7 


Amylin 


KCNTATCATN RLANFLVHSS NNFGAILSST 
NVGSNTY 


8 


Angiotensin-1 


DRVYIHPFHL 


9 


Angiotensin-2 


DRVYIHPF 


10 


Angiotensin-3 


RVYIHPF 


11 


Apelin-1 3 


NRPRLSHLGPMPF 


12 


Astressin 


*FHLLREVLE*IARAEQLAQEAHKNRL*IEII 


13 


Atrial Natriuretic Peptide 


SLRRSSCFGG RMDRIGAQSG LGCNSFRY 


14 


Autocamtide 2 


KKALRRQETV DAL 


15 


BAM 12 


YGGFMRRVGR PE 


16 


BAM 18 


YGGFMRRVGR PEWW 


17 


BAM22 


YGGFMRRVGR PE 


18 


Beta Endorphins ("44") 


YGGFMTSEKS QTPLVTLFKN AIIKNAYKKG 
E 


19 


beta MSH 


AEKKDEGPYR MEHFRWGSPP KD 


20 


beta- Neo-endorphin 


YGGFLRKYP 


21 


BetaAmyloid 


DAEFRHASGYE VHHQKLVFFAE 
DVGSIMLGAIIG LMVGGWIAT 


22 


Beta-Bag Cell Peptide 


RLRFH 


23 
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BNP 


SPKMVQGSGC FGRKMDRISS 
npKvi R RH 


24 


DraayKinin 


RpprjccpcR 
nr r vj lOiii* 




Buccalin 


GMDSLAFSGG L-NH 2 


26 


Bursin 


|/IJ/-> Mil 

K.r1(j-IMM2 


O ~7 
jL / 


C3 (undeca peptide) 


ASKKPKRNIKA 


28 


Caerulein 


r** /x r-x x x / cx *"~x lit i % i a ir a »~ " 

*EQDY(S03H)TGWMDF 


29 


Caicmeurin 


AIP ITSFEEAKGL DRiNERMPPR RDAMP 


30 


Calcitonin 


CGNLSTCMLG TYTQDFNKFH 
TFPQTAIGVG AP 


31 


Catpain Inhibitor { 42 ) 


DPMSSTYIEE LGKREVTIPP KYRELLA 


32 


CAP- 3 7 


NQGRHFCGGA EIHARFVMTA ASCFN 


33 


Cardiodilatin 


* NPMYNAVSNA DLMDFKNLLD 

ft ft ft c — r* ft/ a /i rx t i — rx 

HLEEKMPLED 


34 


CD36peptideP (139-155) 


o k it a \ / a a a r*x ft ■ i \ / /x a i /x r\ / /x 

CNLAVAAASH IYQNQFVQ 


35 


cx 

Cecropm B 


ft/ \ A t ft/ % # ft"" 1/ ft/ 1 r* 1/ A A ft - 1 A ft I rx A 1 /X | \ / A r% A ■ A \ y™* 

KWKVFKKIEK MGRNIRNGIV KAGPAIAVLG 
LAixAL 


36 


Cerebellin 


O/""* O A l/A / A CO A IDCTMU 

oobAKVArbA 1Kb 1 IMH 


0*7 
O/ 


#—» r-> r-> i 


Arr»TATr*\/TLI Dl A /~* 1 1 GDC/" 1 

ALU I A 1 L,V 1 n KLAbLLbnob 

u V V MMIMr Vr 1 1^ VUOlxMr 


O Q 


CGRP-2 


ACNTATCVTH RLAGLLSRSG 

OIVI v IvOIMr V i 1 IMVOO^r 


39 


^ fx O 1 / 


1 OMRRHI HI 1 Fl KFfSril 


"TV/ 


V_*UI Lis Latll lo 


OFfi APPOPj°.A RRDRMPPRMF FWICTF^^SPK 


4-1 


f r\/Qt"^ 1 1 1 r*f» 

y oLCJllll IC 


WG 

W V \J 


42 


Defensin 1 HNP1 


ACYCR1PACI AGERRYGTCI YQGRLWAFCC 


43 


Defensin HNP2 


CYCRIPACIA GERRYGTCIY QGRLWAFCC 


44 


Um 1 1 la o fc?J»> U 1 1 


ALGAAADTIS QTQ 


4.5 


Rvnorr>h in- A 

L'y 1 1 W 1 VJl III 1 


YGGFLRRIRP KLKWDNQ 

ft ^- ' Vi 1 1— 1 U 111 II IX I* | X V V *-^f 1 v * M 


46 

» x^ 


Dx/nornhin-R 


YGGFLRRQFK VVT 

ft * VJ ■ %mm 1111 \A ft 1 Xi V V ft 


47 


Eledoisin 


E * PS K DAFI G LM-NH 2 


48 


Endomorphin-1 


YPWF 


49 


Endomorphin-2 


YPFF 


50 


Endothelin-1 


CSCSSLMDKE CVYFCHLDII W 


51 


Exendin-4 


HSDGTFTSDL SKQMEEEAVR LFIEWLKWGG 
PSSGAPPPS(NH 2 ) 


52 


Fibrinopeptide 


AADSGEGDFLA EGGGVR 


53 


Fibrinopeptide 


BQGVNDNEEGF FSAR 


54 


Fibronectin CS1 


EILDVPST 


55 


FMRF 


FMRF 


56 


Galanin 


GWTLNSAGYL LGPHAVGNHR 
SFSDKNGLTS 


57 


Galantide 


GWTLNSAGYL LGPQQFFGLM{NH 2 ) 


58 


gamma-Bag Cell Peptide 


RLRFD 


59 


| Gastrin 


EGPWLEEEEE AYGWMDF 


60 


Gastrin Releasing 


VPLPAGGGTV LTKMYPRGNH WAVGHLM 


61 
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I I Qhrelin | GSSFLSPEHQ RVQQRKESKK PPAKLQPR I 62~ 

r-G|P YAEGTFISPY SIAMDKIHQQ DFVNWLLAQK 63" 

I GKKNPWKHNI TQ _ 

Glucagon HSQGTFTSDY SKYLDSRRAQ 64" 

I DFVDWLMNT ___| 

I Grb-7 SH2 domain-1 | RRFA C DPDG YPN YFH C VPGG 65 

5 [ I Grb-7 SH2 domain-1 0 \ TGSW C GLMH YPN AWL C NTQG 66~ 

| Grb-7 SH2 doma in- 1 1 | RSKW C RPGY YAN YPQ C WTQG 6/ " 

| Grb-7 SH2 doma in-1 8 [ RSTL C WFEG YPN TFP C KYFR 68 

I Grb-7 SH2 dom ain-2 [ RVQE C KYLY YPN PYL C KPPG ^' 

II Grb-7 SH2 dom ain-23 | GLRR C LYGP YPN AWV C NIHE 70 
10 I Grb-7 SH2 domain-3 [ KLFW C TYEP YAN EWP C PGYS ~7T 

I Grb-7 SH2 dom ain-34 | FCAV C NEEL YEN CGG C SCGK 72 

I Grb-7 SH2 do main-46 [ RTSP C GYIG YPN IFE C TYLG /3 

II Grb-7 SH2 do main-5 | TGEW C AQSV YAN YPN C KSAW 74" 
l Grb-7 SH2 dom ain-6 | NVSR C TYIH YPN WSL C GVEV 7b" 

15 I Grb-7 SH2 domain -8 [ GVSN C VFWG YAN PWL C SPYS 76 

Growth hormone YAPAIFTNSY RKVLGQLSAR KLLQPIMSRQ 77" 

releasing factor | QGESNQERGA RARL 

Guanylin | PGTCEICAYA ACTGC 78" 

Helodermin HSPAIFTEEY SKLLAKLALQ KYLASILGSK 79* 

I TSPPP-NH 2 

20 Helospectin-1 HSPATFTAEY SKLLAKLALQ KYLESILGSS 80" 

| TSPRPPSS 

Heiospectin-2 HSPATFTAEY SKLLAKLALQ KYLES1LG55 ET 

| TSPRPPS 

| [nqis tatin 5 [ PSHAKRHHGY KRKFHEKHHS HRGY 82 

ICE inhibitor(III) [ ac-YVAP-tluroacyloxymethylketone 83" 

Immunostimulating VEPIPY ' 54 

25 Peptide a 

Insulin {A-chain) | GIVEQCCTSI CSLYQLENYC N I 85 

I Insulin (B-chain) [ FVNQHLCGSH LVEALYLVCG ERGFFYTPKT 86 

I Insu lin (whole molecule) [ see above 1 87 

I pKiHelensin 1AKRHPYFL I 88 

30 H Leu-Enkepha lin | YGGFL I S9 

|| Lito rin 1 E*QWAVGHFM-NH 2 I 9Q 

| {"~Ma1anttde RTKRSGSVYE PLKI 91 

I Met- Enkephalin 1 YGGFM 9^ 

Metorphamide | YGGGFMRRV-NH 2 I 93 

35 MotfRn | FVPIFTYGEL QRMQEKERNK GQ 94 

|pvT7 omodulin | PMSMLRL-NH 2 

I p^Tosin Kinase [ IPKKRAARATS-NH 2 "96 
Necrofibrin GAVSTA ~T~97 
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30 



Neuropeptide Y 



Neurotensin 



YPSKPDNPGE DAPAEDMARY 
YSAKRHYINL 1TRQ RY-NH 2 
E*LYENKPRRPUIL 



Nocic eptin 

Nociceptin/Orphanin PU 



FGGFTGARKS ARKLANQ 



FAEPLPSEEE GESYSKEVPE MEKRYGGFMR 
F 



Noci statin 



Orexin A 



EQ KQLQ 

E*PLPDCCRQl<TCSCRLYELLHGAGNHAAGI 

LTL-NH, 



Orexin B 



RSGPPGLQGR LQRLLQASGN HAAGILTM- 
NH 2 



Osteocalcin 



YLYQWLGAPV PYPDPLEPRR EVCELNPUCU 
ELADHIGFQE AYRRFYGPV 



Oxytocin 
PACAP 



PACAP-RP 
Pancreatic Polypeptide 

Papain Inhibitor 
Peptide 



CYI QNCPLG-NH 2 __ 
HSDGIFTPSY SRYRKQMAVK KYLAAVL 
DVAHG1LNEA YRKVLDQLSA G KHLQSLVA 
APLEPVYPGD NATPEQMAQY 
AADLRRYINM LTRPRY-NH 2 

GGYR 

YGGFMRRVGR PE 



15 i Peptide YY 



Phosphate acceptor 



YPIKPEAPGE DASPEELNRY YASLRHYLNL 
VTRQRY-NH 2 
RRKASGPPV 



Physalaemin 
Ranatensin 



E*ADPNKFYGLM-NH 2 " 
E*VPQWAVGHFM-NH 2 



RGD peptides 



Rigin 



RR-SRC 



Schizophrenia 
Secretin 



Serum Thymic Factor 



GQPR 

RRLIEDAEYA AR G 

RPTVL 

HSDGTFTSEL SRLREGARLQ RLLQGLV 

E*AKSQGGSN 



structural-site zinc 
ligands-alpha 
structural-site zinc 
ligands-beta 



structural-site zinc 
li gands-gamma 
structural-site-zinc 
ligands-pi 



PQCGKCRICK NPESNYCLK 
PQCGKCRVCK NPESNYULK. 
PQCGKCRICK NPESNYCLK 



structural-site-zinc 
ligands-X 
Substance P 
Syntide 2 



Systemin 
Thrombin-light chain 



PLCRKCKFCLSPLTNLCGK 

PQGECKFCLNPKTNLCQI 

RPKPQQFFGL M-NH 2 
PLARTLSVAG LPGKK 
AVQSKPPSKR DPPKMQTD 
TFGSGEADCG LRPLFEKKSL EDKTbHtLLt 
SYIDGR 



Thymopentin 



RKDVY 



40 Thymus Factor 



QAKSQGGSN 



101 



102 



104 



105 



106 



107 



108 



115 



116 



118 



119 



124 



125 



126 



127 



128 



129 



134 
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TRH 


E*HP 


136 


Tuftsin 


TKPR 


137 


Uperoiein 


E*PDPNAFYGLM-NH 2 


138 


Uremic Pentapeptide 


DLWQK 


139 


Urocortin 


DNPSLSIDLT FHLLRTLLEL ARTQSQRERA 
EQNRIIFDSV 


140 


Uroguanylin 


NDDCELCVNV ACTGCL 


141 


Vasonatrin 


GLSKGCFGLK LDRIGSMSGL GCNSFRY 


142 


Vasopressin 


CYFQNCPRG 


143 


Vasotocin 


CYIQNCPRG 


144 


VIP 


HSDAVFTDNY TRLRKQMAVK KYLNSiLN 


145 


Xenin 


MLTKFETKSA RVKGLSFHPK RPWIL 


146 


YXN motif 


Tyr-X-Asn 


147 


Zinc ligand of carbonic 
anhydrase 1 


FQFHFHWGS 


148 


Zinc ligand of carbonic 
anhydrase 


IIIQFHFHWGS 


149 



Other selections for Y are can be identified by those of skill in the 
art and include, for example, those disclosed in Techniques in Protein 
20 Chemistry, Vol, 1 (1989) T.» Hugli ed. (Academic Press); Techniques in 
Protein Chemistry, Vol. 5 (1994) J.W. Crabb ed. (Academic Press); 
Lundblad Techniques in Protein Modification (1995) (CRC Press, Boca 
Raton, FL); Glazer et aL (1976) Chemical Modification of Proteins (North 
Holland (Amsterdam))(American Elsevier, New York); and Hermanson 
25 (1996) Bioconjugate Techniques (Academic Press, San Diego, CA). 

4. Sorting Functions "Q" 

The compounds provided herein can include a sorting function 
("Q"), which permits the compounds to be addressed, such as by capture 
in a 2-D array. The sorting functions are "tags" such oligonucleotide tags 

30 such that when the compounds are bathed over an array of 

complementary oligonucleotides linked to a solid supports, such as beads, 
chips, under suitable binding conditions, the oligonucleotides hybridize. 
The identity of the capture compound can be known by virtue of its 
position in the array. Other sorting functions can be optically coded, 

35 including as color coded or bar coded beads that can be separated, or an 
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electronically-tagged, such as by providing microreactor supports with 
electronic tags or bar coded supports (see, e.g., U.S. Patent No. 
6,025,129; U.S. Patent No. 6,017,496; U.S. Patent No. 5,972,639; U.S. 
Patent No. 5,961,923; U.S. Patent No. 5,925,562; U.S. Patent No. 
5 5,874,214; U.S. Patent No. 5,751,629; U.S. Patent No. 5,741,462), or 
chemical tags (see, e.g., U.S. Patent No. 5,432,018; U.S. Patent No. 
5,547,839) or colored tags or other such addressing methods that can be 
used in place of physically addressable arrays. The sorting function is 
selected to permit physical arraying or other addressable separation 
10 method suitable for analysis, particularly mass spectrometric, including 

MALDI, analysis. 

Other sorting fuctions for use in the compounds provided herein 
include biotin, 6-His, BODIPY, oligonucleotides, nucleosides, nucleotides, 
antibodies immunotoxin conjugates, adhesive peptides, lectins, 
15 liposomes, PNA, activated dextrans and peptides. In one embodiment, 
the sorting function is an oligonucleotide, particularly, either a single- 
stranded or partially single-stranted oligonucleotide to permit hybridization 
to single-stranded regions on complementary oligonucleotides on solid 
supports. 

20 In one embodiment of the capture compounds provided herein, Q is 

a single stranded unprotected or suitably protected oligonucleotide or 
oligonucleotide analog [e.g., PNA) of up to 50 building blocks, which is 
capable of hybridizing with a base-complementary single stranded nucleic 
acid molecule. In certain embodiments, Q contains from about 5 up to 

25 about 10, 15, 25, 30, 35, 40, 45 or 50 building blocks. 

Biomolecule mixtures, including, but not limited to, protein 
mixtures, can have different hydrophobicities (solubility) than the 
compounds provided herein. In certain embodiments, in order to achieve 
high reaction yields between the functionality X on the compounds 
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provided herein and the protein surface, the reaction is performed in 
solution. In other embodiments, the reaction is performed at a solid/liquid 
or liquid/liquid interface. In certain embodiments, the solubility properties 
of the compounds provided herein are dominated by the Q moiety. A 
5 change in the structure of Q can, in these embodiments, accommodate 
different solubilities. For example, if the protein mixture is very water 
soluble, Q can have natural phosphodiester linkages; if the bimolecular 
mixture is very hydrophobic (lipids, glycolipids, membrane proteins, 
lipoproteins), Q can have it's phosphodiester bonds protected as 

10 phosphotriesters, or alternatively, these bonds can be methylphos- 
phonatediesters or peptide nucleic acids (PNAs). If the biomolecule 
mixture is of an intermediate hydrophobicity, solubility is achieved, e.g., 
with phosphothioate diester bonds. Intermediate solubility also can be 
attained by mixing phosphodiester with phosphotriester linkages. Those 

15 skilled in the art can easily conceive of other means to achieve this goal, 
including, but not limited to, addition of substituents on Z, as described 
elsewhere herein, or use of beads for Z that are hydrophobic, including, 
but not limited to, polystyrene, polyethylene, polypropylene or teflon, or 
hydrophilic, including, but not limited to, cellulose, dextran cross—linked 

20 with epichlorohydrin (e.g., Sephadex 0 ), agarose {e.g., Sepharose*), lectins, 
adhesive polypeptides, and polyacrylamides. 

The flexibility of being able to change the solubility of the 
compounds is a significant advantage over current methods. In contrast, 
2D gel electrophoresis is useful only for analysis of water soluble proteins 

25 with the result that about 30 to 35% of all cellular proteins, such as 

those residing in the cell membrane, cannot be analyzed by this method. 
This is a severe limitation of 2D gel electrophoresis since many proteins, 
including, but not limited to, those involved in tissue specific cell-cell 
contacts, signal transduction, ion channels and receptors, are localized in 
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the cell membrane. 

In one embodiment, after reaction or complexation of the 
compounds provided herein with a biomolecule, including, but not limited 
to, a protein, the compounds are brought into contact with a set of 
5 spatially resolved complementary sequences on a flat support, beads or 
microtiter plates under hybridization conditions. 

In certain embodiments, Q is a monovalent oligonucleotide or 
oligonucleotide analog group that is at least partially single stranded or 
includes a region that can be single-stranded for hybridization to 
10 complementary oligonucleotides on a a support. Q can have the 
formula: 
N 1 -B-N 2 - 

where N 1 and N 2 are regions of conserved sequences; B is a region of 
sequence permutations; m, i and n are the number of building blocks in 
15 N 1 , B and N 2 , respectively; and the sum of m, n and i is a number of units 
able to hybridize with a complementary nucleic acid sequence to form a 
stable hybrid. Thus, in embodiments where B is a single stranded DIMA or 

* 

RNA, the number of sequence permutations is equal to 4'. In one 
embodiment, the sum of m, n and i is about 5 up to about 10, 15, 25, 

20 30, 35, 40, 45 or 50. In certain embodiments m and n are each 

independently 0 to about 48, or are each independently about 1 to about 
25, or about 1 to about 10 or 15, or about 1 to about 5. In other 
embodiments, i is about 2 to about 25, or is about 3 to about 12, or is 
about 3 to about 5, 6, 7 or 8. 

25 The oligonucleotide portion, or oligonucleotide analog portion, of 

the compounds (N 1 m -B r N 2 n -K can be varied to allow optimal size for 
binding and sequence recognition. The diversity of the sequence 
permutation region B can be relatively low if the biomolecule mixture, 
including, but not limited to, protein mixtures, is of low complexity. If 
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the mixture is of high complexity, the sequence region B has to be of high 
diversity to afford sufficient resolving power to separate all the species. 
The flanking conserved regions N 1 m and N 2 n , need only be long enough to 
provide for efficient and stable hybrid formation. There is, however, 
5 flexibility in designing these regions: N 1 m and N 2 n can be of the same 

length and same sequence, of the same length and different sequence or 
of different length and different sequence. In certain embodiments, 
including those where B is of sufficient length to provide stable hybrid 
formation, N 1 and/or N 2 are absent. In these embodiments, the 

10 oligonucleotide portion of the compounds, or oligonucleotide analog 
portion of the compounds, has the formula N 1 m -B r , or B r N 2 n -, or B r . 

In an exemplary embodiment (see, e.g., EXAMPLE 1.a.), B has a 
trinucleotide sequence embedded within a 1 1-mer oligonucleotide 
sequence, where the N 1 m and N 2 n tetranucleotide sequences provide 

15 flanking identical (conserved) regions. This arrangement for NV B f N V 
affords 64 different compounds where each compound carries the same 
reactive functionality X. In another exemplary embodiment (see, e.g., 
EXAMPLE 1 .b.), B has a tetranucleotide sequence embedded within a 12- 
mer oligonucleotide sequence, where the H\ and N 2 n oligonucleotide 

20 sequences provide flanking but not identical octanucleotide sequences. 
This arrangement for N 1 m -B r N 2 n - affords 256 different compounds where 
each carry the same reactive functionality X. In a further exemplary 
embodiment (see, e.g., EXAMPLE I.e.), B has an octanucleotide 
sequence embedded within a 23-mer oligonucleotide sequence, where the 

25 N 1 m and N 2 n oligonucleotide sequences provide flanking but not identical 
octanucleotide sequences. This arrangement for N^-Bj-NV affords 
65,536 different compounds where each carries the same reactive 
functionality X, and exceeds the estimated complexity of the human 
proteome (e.g., 30,000-35,000 different proteins). In certain 
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embodiments, use of a B with excess permutations for the complexity of 
the protein mixture, as the oligonucleotides with the best hybridization 
properties can be used for analysis to reduce mismatching. 

5. Solubility Functions "W" 

5 The compounds provided herein can incude a solubility function, 

W, to confer a desired solubility properties, such as solubility in 
hydrophobic environments or hydrophilic environments to permit probing 
of biomolecules in physiological environments, such as in membranes. 
Exemplary solubility functions for use in the compounds provided herein 

10 include polyethylene glycols, sulfates, polysulfates, phosphates, 

sulfonamtes, polysulfonates, carbohydrates, dextrin, polyphosphates, 
poly-carboxylic acids, triethanolamine, alcohols, water soluble polymers, 
salts of alkyl and aryl carboxylic acids and glycols. 

Amphiphlic compounds, such as quaternary ammonium salts (i.e., 

15 betain, choline, spingomyeline, tetramethyl {or tetrabutyl) alkyl 

ammonium salts, cationic, ionic and neutral tensides may also be used as 
the solubility function W. 

In other embodiments, W also can be used to modulate the 
solubility of the compounds to achieve homogeneous solutions, if desired, 

20 when reacting with biomolecule mixtures, including, but not limited to, 
protein mixtures. In certain embodiments, W is a sulfonate, a polar 
functionality that can be used to make the compounds more water- 
soluble. In other embodiments, W is a hydrophobic group, including 
lower alkyl, such as tert-butyl, tert-amyl, isoamyl, isopropyl, n-hexyl, sec- 

25 hexyl, isohexyl, n-butyl, sec-butyl, iso-butyl and n-amyl, or an aryl group, 
including phenyl or naphthyl. 

6. Exemplary Embodiments 

The following provides exemplary capture compounds that exhibit 
the above-described properties. It is understood that these are exemplary 
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only and that any compounds that can react covalently with a 
biomolecule or by other highly stable interaction that is stable to analytic 
conditions, such as those of mass spectrometric analysis, and that can 
sorted or otherwise identified are contemplated for use in the collections. 
5 a. Exemplary embodiment 1 

In one embodiment, the compounds for use in the methods 
provided herein have formulae: 
Q-Z-X or Q-Z-Y, 

where Q is a sorting function that contains a single stranded unprotected 
10 or suitably protected oligonucleotide or oligonucleotide analog [e.g., 

peptide nucleic acid (PNA)) of up to 50 building blocks, which is capable 
of hybridizing with a base-complementary single-stranded nucleic acid 
molecule; 

Z is a moiety that is cleavable prior to or during analysis of a 
15 biomolecule, including mass spectral analysis, without altering the 
structure of the biomolecule, including, but not limited to, a protein; 

X is a reactivity functional group that interacts with and/or reacts 
with functionalities on the surface of a biomolecule, including, but not 
limited to, a protein, to form covalent bonds or bonds that are stable 
20 under conditions of mass spectrometric analysis, particularly MALDI 
analysis; and 

Y is a selectivity functional group that interacts with and/or reacts 
by imposing unique selectivity by introducing functionalities that interact 
noncovalently with target proteins. 
25 b. Exemplary embodiment 2 

In another embodiment, the compounds for use in the methods 
provided herein have formula: 



30 
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Q-Z-X 
i 

Y 

5 where Q is a single-stranded unprotected or suitably protected 

oligonucleotide or oligonucleotide analog (e.g., peptide nucleic acid (PNA)) 
of up to 50 building blocks, which is capable of hybridizing with a base- 
complementary single stranded nucleic acid molecule; 

Z is a moiety that is cleavable prior to or during analysis of a 
10 biomolecule, including mass spectral analysis, without altering the 
structure of the biomolecule, including, but not limited to, a protein; 

X is a functional group that interacts with and/or reacts with 
functionalities on the surface of a biomolecule, including, but not limited 
to, a protein, to form covalent bonds or bonds that are stable under 
15 conditions of mass spectrometric analysis, particularly MALDI analysis; 
and 

Y is a functional group that interacts with and/or reacts by 
imposing unique selectivity by introducing functionalities that interact 
noncovalently with target proteins. 
20 c. Exemplary embodiment 3 

In another embodiment, the compounds for use in the methods 
provided herein have formula: 

Q-Z-X 

I 

25 Y 

where Q is a sorting function that is a compound, or one or more 
biomolecules (e.g., a pharmaceutical drug preparation, a biomolecule, 
drug or other compound that immobilizes to the substrate and captures 
30 target biomolecules), which is(are) capable of specific noncovalent 
binding to a known compound to produce a tighly bound capture 
compound; 
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Z is a moiety that is cleavable prior to or during analysis of a 
biomolecule, including mass spectral analysis, without altering the 
structure of the biomolecule, including, but not limited to, a protein; 
X is a functional group that interacts with and/or reacts with 
5 functionalities on the surface of a biomolecule, including, but not limited 
to, a protein, to form covalent bonds or bonds that are stable under 
conditions of mass spectrometric analysis, particularly MALDI analysis; 
and 

Y is a functional group that interacts with and/or reacts by 
10 imposing unique selectivity by introducing functionalities that interact 
noncovalently with target proteins. 

d. Exemplary embodiment 4 
In another embodiment, the compounds for use in the methods 
provided herein have the formulae: 

15 

Q-z-(X) m 

(Y) n 

20 or Q-Z-(X) m or Q-Z-(Y) n , 

where Q, Z, X and Y are as defined above; m is an integer from 1 to 100, 
in one embodiment 1 to 10, in another embodiment 1 to 3, 4 or 5; and n 
in an integer from 1 to 100, in one embodiment 1 to 10, in another 
embodiment 1 to 3, 4 or 5. 

25 e. other Exemplary embodiments 

In other embodiments, X is a pharmaceutical drug. The 
compounds of these embodiments can be used in drug screening by 
capturing biomolecules, including but not limited to proteins, which bind 
to the pharmaceutical drug. Mutations in the biomolecules interfering 

30 with binding to the pharmaceutical drug are identified, thereby 

determining possible mechanisms of drug resistance. See, e.g., Hessler 
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et al. (November 9-1 1 , 2001) Ninth Foresight Conference on Mo/ecular 
Nanotechnology (Abstract) (http://www.foresight.org/Conferences/- 
MNT9/Abstracts/Hessler/) . 

f . Other embodiments 
5 In certain embodiments, the compounds provided herein have the 

formula: 

NVBfNVfSVMtR'VtSVL-X 

where N\ B, N 2 , S 1 , M, S 2 , L, X f m, i, n, t, a and b are as defined above. 
In further embodiments, the compounds for use in the methods provided 
10 herein include a mass modifying tag and have the formula: 
N^-BrNVtS^t-MtR'^.-O^b-L-T-X, where 

N\ B, N 2 , S\ M, S 2 , L, T, X, m, i, n, t, a and b are as defined above. 

In other embodiments, including those where Z is not a cleavable 
linker, the compounds provided herein have the formula: 
15 N 1 m -B r NV(S 1 ) t -M(R 15 ) a -(S 2 ) b -X, where N\ B, N 2 7 S 1 , M, S 2 , X, m, i, n, t, a 
and b are as defined above. 

In another embodiment, the compounds for use in the methods 
provided herein include those of formulae: 

20 



25 



30 
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where L and M are each independently O, S or NR 3 ; X is a reactivity 
function, as described above; Y is a selectivity function, as described 
above; Q is a sorting function, as described above; and each R 3 is 
independently hydrogen, substituted or unsubstituted alkyl, substituted or 
5 unsubstituted alkenyl, substituted or unsubstituted alkynyl, substituted or 
unsubstituted cycloalkyl, substituted or unsubstituted heterocyclyl, 
substituted or unsubstituted aryl, substituted or unsubstituted heteroaryl, 
substituted or unsubstituted aralkyl, or substituted or unsubstituted 
heteroaralkyl. 

10 In another embodiment, the capture compounds provided herein 

have the* formula: 
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10 



15 



// 




LY 



QM 



where L, M, X, Y and Q are as defined above- 
In another embodiment, the capture compounds provided herein 
have the formula: 



20 



Y-HNorO 



30 




O or NH-S -Q 



OorNH-X 



OMe 



where X, Y, Q and S 1 are as defined above. 
35 In another embodiment, the capture compounds provided herein 

have the formula: 



40 
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10 



Y-HN or O 




OorNH-S -Q 



O or NH-X 



OMe 



where Q, Y, X and S 1 are as defined above. 
15 In another embodiment, the capture compounds provided herein 

have the formula: 



20 



Q-HN or O 



25 (Y or W or H)0 




0(Y orWorH) 



0(Y orWorH) 



where X, Y # Q and W are as defined above. 

In another embodiment, the capture compounds for use in the 
30 methods provided herein have the formulae: 



35 



40 
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10 



R 



X 






R 




s 2 -x 




15 



20 



W-R 



X 





or 



W-R 




s 2 -x 




30 



where X, Y, Q and W are selected as above; and R is substituted or 
unsubstituted alkyl, substituted or unsubstituted cycloalkyl, substituted or 
unsubstituted cycloalkylalkyl, or substituted or unsubstituted aralkyl. In 
another embodiment, R is selected from cyclohexyl, cyclohexyl-(CH 2 ) n -, 
isopropyl, and phenyl-(CH 2 ) n -, where n is 1 , 2 or 3. As shown in the 
formulae above, R is optionally substituted with W. 

In other embodiments, the compounds for use in the methods 
provided herein include: 



40 
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10 



15 



OMe 




OH 



O O-N 



o 



OMe 





O 



O^O-N 



r 

o 



CK 



Specific compounds within these embodiments are those resulting 
from all combinations of the groups listed above for the variables 
20 contained in this formula and all can include Q groups. It is intended 
herein that each of these specific compounds is within the scope of the 
disclosure herein. 
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D. Preparation of the Capture Compounds 

The capture compounds are designed by assessing the target 
biomolecules and reaction conditions. For example, if the target 
biomolecules are proteins, X functions suitable to effect covalent or 
5 binding to proteins with high affinity are selected. Y is selected 
according to the complexity of the target mixture and the desired 
specificity of binding by X. Q is selected according the number of 
divisions of the mixture that are desired; and W is selected based upon 
the environment of the biolmolecules that is probed. A variety of capture 

10 compounds are designed according to such criteria. 

The capture compounds once designed can be synthesized by 
methods available to those of skill in the art. Preparation of exemplary 
capture compounds is described below. Any capture compound or similar 
capture compound can be synthesized according to a method discussed 

15 in general below or by minor modification of the methods by selecting 

appropriate starting materials or by methods known to those of skill in the 
art. 

In general, the capture compounds can prepared starting with the 
central moiety Z. In certain embodiments, Z is -(S 1 ) t -M(R 15 ) a -(S 2 ) b -L-. In 

20 these embodiments, the capture compounds can be prepared starting 
with an appropriately substituted (e.g., with one or more R 15 groups) M 
group. M(R 15 ) a is optionally linked with S 1 and/or S 2 , followed by linkage 
to the cleavable linker L. Alternatively, the L group is optionally linked to 
S 2 , followed by reaction with M(R 15 ) a , and optionally S\ This Z group is 

25 then derivatized on its S 1 (or M(R 15 ) a ) terminus to have a functionality for 
coupling with an oligonucleotide or oligonucleotide analog Q {e.g., a 
phosphoramidite, H-phosphonate, or phosphoric triester group). The Q 

* 

group will generally be N-protected on the bases to avoid competing 
reactions upon introduction of the X moiety. In one embodiment, the Z 
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group is reacted with a mixture of ail possible permutations of an 
oligonucleotide or oligonucleotide Q {e.g., 4 l permutations where i is the 
number of nucleotides or nucleotide analogs in B). The resulting Q-Z 
capture compound or capture compounds is(are) then derivatized through 
5 the L terminus to possess an X group for reaction with a biomolecule, 
such as a protein. If desired, the N-protecting groups on the Q moiety 
are then removed. Alternatively, the N-protecting groups can be removed 
following reaction of the capture compound with a biomolecule, including 
a protein. In other embodiments, Q can be synthesized on Z, including 

10 embodiments where Z is an insoluble support or substrate, such as a 
bead. In a further embodiment, Q is presynthesized by standard solid 
state techniques, then linked to M. Alternatively, Q can be synthesized 
stepwise on the M moiety. 

Provided below are examples of syntheses of the capture 

15 compounds provided herein containing alkaline-labile and photocleavable 
linkers. One of skill in the art can prepare other capture compounds 
disclosure by routine modification of the methods presented herein, or by 
other methods known to those of skill in the art. 

For synthesis of a compound provided herein containing an 

20 alkaline-labile linker, 1 ,4-di(hydroxymethyl)benzene (i.e., M) is mono- 
protected, e.g., as the corresponding mono-ferf-butyldimethylsilyl ether. 
The remaining free alcohol is derivatized as the corresponding 2-cyano- 
ethyl-/V,/V-diisopropylphosphoramidite by reaction with 2-cyanoethyl-/V,/V- 
diisopropylchlorophosphoramidite. Reaction of this amidite with«an 

25 oligonucleotide, {i.e., Q), is followed by removal of the protecting group 
to provide the corresponding alcohol. Reaction with, e.g., trichloromethyl 
chloroformate affords the illustrated chloro formate (i.e., X). 



30 
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5 




, ,. . ...^ oligonucleotide 

oligonucleotide y 



For the synthesis of a compound provided herein containing a 
15 photocleavable linker, 2-nitro-5-hydroxybenza!dehyde (i.e., a precursor of 
L) is reacted with, e.g., 3-bromo-1-propanol to give the corresponding 
ether-alcohol. The alcohol is then protected, e.g., as the corresponding 
terf-butyldimethylsilyl ether. Reaction of this compound with 
trimethylaluminum gives the corresponding benzyl alcohol, which is 
20 derivatized as its phosphoramidite using the procedure described above. 
The amidite is reacted with an oligonucleotide {i.e., Q), followed by 
removal of the protecting group and derivatization of the resulting alcohol 
as the corresponding chloroformate {i.e., X). 



30 
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For the synthesis of the compounds provided herein containing an 
acid labile linker, e.g., a heterobifunctional trityl ether, the requisite 
phosphoramidite trityl ether is reacted with the oligonucleotide or 
oligonucleotide analog Q, followed by deprotection of the trityl ether and 
5 capture of a biomolecule, e.g., a protein, on the alcohol via a reactive 
derivative of the alcohol (X), as described above. 



10 
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The above syntheses are exemplary only. One of skill in the art 
will be able to modify the above syntheses in a routine manner to 
synthesize other compounds within the scope of the instant disclosure. 
Syntheses of capture compounds as provided herein are within the skill of 

5 the skilled artisan. 

E. Methods of Use of the Compounds 

The capture compounds provided herein can be used for the 
analysis, quantification, purification and/or identification of the 
components of biomolecule mixtures, including, but not limited to, protein 
10 mixtures. They can be used to screen libraries of small molecules to 
identify drug candidates, and they can be used to assess biomolecule- 
biomolecule interactions and to identify biomolecule complexes and 
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intermediates, such as those in biochemical pathways and other biological 
intermediates. 

To initiate an analytical process, mixtures of biomolecules are 
obtained or prepared. They can then be pre-purified or partially purified 
5 as neeed, according to standard procedures. Biomolecules are isolated 
from samples using standard methods. Figure 20a depicts an exemplary 
capture assay in which capture compounds are bound to biomolecules 
and analyzed by MALDI-TOF MS. Example 9 and Figures 20b-f show 
results of exemplary assays using a variety of capture compounds and 

10 known proteins. 

1 . General methods 

The collections provided herein have a wide variety of 
applications, including reducing the complexitity of mixtures of molecules, 
particularly biomolecules, by contacting the collection with the mixtures 

15 to permit ovalent binding of molecules in the mixtures. The capture 
compounds are can be arrayed by virtue of the sorting function either 
before, during or after the contacting. Following contacting and arraying 
the loci of the array each contain a subset of the molecules in the 
mixture. The array can then be analyzed, such as by using mass 

20 spectrometry. 

For example, proteins are isolated from biological fluids and/or 
tissus by cell lysis followed, for example, by either precipitation methods 
{e.g., ammonium sulfate) or enzymatic degradation of the nucleic acids 
and carbohydrates (if necessary) and the low molecular weight material is 

25 removed by molecular sieving. Proteins also can be obtained from 

expression libraries. Aliquots of the protein mixture are reacted with the 
collections of capture compounds, generally members of the collection 
have different functionalities, such as different reactivity and/or 
selectivity, to separate the mixture into separate protein families 
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according to the selected reactivity of X or the reactivity function plus the 
selectivity function. The diversity (number of different) selected for the 
sorting function Q depends on the complexity of the target mixture of 
biomolecules, such as proteins. Hence, for exmaple, where there are sets 
5 of compounds differing in X and Y, solubility function and Q is an 
oligonucleotide, B is selected of an appropriate length to provide for 
sufficient number loci in the resulting array so that ultimately each "spot" 
on the array has about 5 to 50 or so biomolecules bound to a particular 
capture compound. In general, although not necessarily, all capture 

10 compounds with a particular "Q" are the same, so that each "spot" on 
the resulting array contains the same capture compounds. There, 
however, aer embodiments, in which a plurality of different capture 
compounds can have the same Q functionality. 

As noted, an array encompasses not only 2-D arrays on solid 

15 supports but any collection that is addressable or in which members are 
identifiable, such as by tagging with colored beads or RF tags or chemical 
tags or symbologies on beads. "Spots" or loci on the array, collections 
where capture compounds are sorted accoding to their "Q" function are 
separated. 

20 !n certain embodiments, the analysis is conducted using the 

smallest possible number of reactions necessary to completely analyze 
the mixture. Thus, in these embodiments, selection of the diversity of Q 
and of the number of X and X/Y groups of different reactivity will be a 
function of the complexity of the biomolecule mixture to be analyzed. 

25 Minimization of the diversity of B and the number of X and/or X/Y groups 
allows for complete analysis of the mixture with minimal complexity. 

The separation of proteins from a complex mixture is achieved by 
virtue of the compound-protein products bound to different members of 
the collection. The supernatant, which contains the capture compound- 
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protein products, is contacted with support bound or otherwise labeled or 
addressed recipient molecules, such as oligonucleotides on a support and 
allowed to bind, suchas by hybridization to an array of complementary 
oligonucleotides. In one embodiment, a flat solid support that carries at 
5 spatially distinct locations, an array of oligonucleotides or oligonucleotide 
analogs that is complementary to the selected N VB r N 2 n oligonucleotide 
or oligonucleotide analog, is hybridized to the capture compound- 
biomolecule products. 

In embodiments where Z is an insoluble support or substrate, such 

10 as a bead, separation of the compound-protein products into an 

addressable array can be achieved by sorting into an array of microwell or 
microtiter plates, or other microcontainer arrays or by labeling with an 
identifiable tag. The microwell or microtiter plates, or microontainers, can 
include single-stranded oligonucleotides or oligonucleotide analogs that 

15 are complementary to the oligonucleotide or oligonucleotide analog Q. 

After reaction or complexation of the compounds with the proteins, 
any excess compounds can be removed by adding a reagent designed to 
act as a "capturing agent," For example, a biotinylated small molecule, 
which has a functionality identical or similar to that that reacted with the 

20 selected X, is allowed to react with any excess compound. Exposure of 
this mixture to streptavidin bound to a magnetic bead, allows for removal 
of the excess of the compounds. 

Hybridization of the compound-protein products to a 
complementary sequence is effected according to standard conditions 

25 {e.g., in the present of chaotropic salts to balance T m values of the 

various hybrids). Any non-hybridized material can be washed off and the 
hybridized material analyzed. 

In further embodiments, the methods herein use mixtures of the 
compounds provided herein that have permuted Q groups to achieve 
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sorting of the biomolecules following reaction with the compounds. 
These mixtures of compounds, in certain embodiments, have subsets 
(e.g., 64 or 256 or 1024) of different X reagents out of the 4 1 
permutations in Q, where i is the number of nucleotides or analogs 
5 thereof contained in the B moiety of Q {e.g., 65,536 permutations for i = 
8). Reaction of the subsets separately with an aliquot of the biomolecule 
mixture to be analyzed results in conjugate mixtures that can be aligned 
with, e.g., a microtiter plate format (e.g., 96, 384 1536, etc.). Analysis 
using these subsets of compound mixtures provides further sorting of the 
10 biomolecules prior to analysis. 

In other embodiments, selective pooling of the products of different 
X moiety-containing reagents (e.g., amino- and thiol-reactive X groups; 
antibody and amino-reactive X groups; antibody and lectin X groups, etc.) 
can be performed for combined analysis on a single assay (e.g., on a 

15 single chip). 

Figure 1 depicts an exemplary method for separation and analysis 
of a complex mixture of proteins by use of MALDI-TOF mass 
spectrometry. Exposure of a compound as described herein, to a mixture 
of biomolecules, including, but not limited to, proteins (P1 to P4), affords 

20 a compound-protein array (NA = oligonucleotide moiety or 

oligonucleotide analog moiety, L = cleavable linker, P = protein). 
Separation of the array is effected by hybridization of the Q portion of the 
array to a complementary sequence attached to a support, such as an 
oligonucleotide chip. The proteins {PI to P4) are then analyzed by 

25 MALDI-TOF mass spectrometry. 

When the complexity of a mixture of biomolecules, including, but 
not limited to, proteins, is low, affinity chromatographic or affinity 
filtration methods can be applied to separate the compound-protein 
products from the protein mixture. If the proteins to be analyzed were 
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fluorescently labeled prior to (or after) reaction with the compound but 
prior to hybridization, these labeled proteins also can be detected on the 
array. In this way the positions that carry a hybrid can be detected prior 
to scanning over the array with MALDI-TOF mass spectrometry and the 
5 time to analyze the array minimized. Mass spectrometers of various kinds 
can be applied to analyze the proteins {e.g., linear or with reflection, with 
or without delayed extraction, with TOF, Q-TOFs or Fourier Transform 
analyzer with lasers of different wavelengths and xy sample stages). 
Mass spectrometry formats for use herein, include, but are not 

10 limited to, matrix assisted laser desorption ionization (MALDI), continuous 
or pulsed electrospray (ES) ionization, ionspray, thermospray, or massive 
cluster impact mass spectrometry and a detection format such as linear 
time-of-flight (TOF), reflectron time-of-flight, single quadruple, multiple 
quadruple, single magnetic sector, multiple magnetic sector, Fourier 

15 transform, ion cyclotron resonance (ICR), ion trap, and combinations 
thereof such as MALDI-TOF spectrometry. For example, for ES, the 
samples, dissolved in water or in a volatile buffer, are injected either 
continuously or discontinuously into an atmospheric pressure ionization 
interface (API) and then mass analyzed by a~ quadrupole. The generation 

20 of multiple ion peaks that can be obtained using ES mass spectrometry 

can increase the accuracy of the mass determination. Even more detailed 
information on the specific structure can be obtained using an MS/MS 
quadrupole configuration. 

Methods for performing MALDI are known to those of skill in the 

25 art. Numerous methods for improving resolution are also known. For 
example, resolution in MALDI TOF mass spectrometry can be improved 
by reducing the number of high energy collisions during ion extraction 
(see, e.g., Juhasz et al. (1996) Analysis, Anal. Chem. 55:941-946, see 
also, e.g., U.S. Patent No. 5,777,325, U.S. Patent No. 5,742,049, U.S. 



WO 03/092581 PCTYUS02/22821 

-122- 

Patent No. 5,654,545, U.S. Patent No. 5,641,959, U.S. Patent No. 
5,654,545, U.S. Patent No. 5,760,393 and U.S. Patent No. 5,760,393 
for descriptions of MALDI and delayed extraction protocols). 
Conditioning of molecules to be analyzed or of the capture-compound 
5 bound biomolecules prior to analysis also can be employed. 

In MALDI mass spectrometry (MALDI-MS), various mass analyzers 
can be used, e.g., magnetic sector/magnetic deflection instruments in 
single or triple 

quadrupole mode (MS/MS), Fourier transform and time-of-flight (TOF), 

10 including orthogonal time-of-flight (O-TOF), configurations as is known in 
the art of mass spectrometry. For the desorption/ionization process, 
numerous matrix/laser combinations can be used. Ion-trap and reflectron 
configurations also can be employed. 

MALDI-MS requires the biomolecule to be incorporated into a 

15 matrix. It has been performed on polypeptides and on nucleic acids 

mixed in a solid (i.e., crystalline) matrix. The matrix is selected so that it 
absorbs the laser radiation. In these methods, a laser, such as a UV or 
IR laser, is used to strike the biomolecule/matrix mixture, which is 
crystallized on a probe tip or other suitable support, thereby effecting 

20 desorption and ionization of the biomolecule. In addition, MALDI-MS has 
been performed on polypeptides, glycerol, and other liquids as a matrix. 

A complex protein mixture can be selectively dissected, and in 
taking all data together, completely analyzed through the use of 
compounds with different functionalities X. The proteins present in a 

25 mixture of biological origin can be detected because all proteins have 
reactive functionalities present on their surfaces. If at each position on 
the compound-protein array, there is the same protein cleavable under the 
same conditions as L or is added without covalent attachment to the solid 
support and serving as an internal molecular weight standard, the relative 
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amount of each protein (or peptide if the protein array was enzymatically 
digested) can be determined. This process allows for the detection of 
changes in expressed proteins when comparing tissues from healthy and 
disease individuals, or when comparing the same tissue under different 
5 physiological conditions (e.g., time dependent studies). The process also 
allows for the detection of changes in expressed proteins when 
comparing different sections of tissues (e.g., tumors), which can be 
obtained, e.g., by laser bioposy. 

Protein-protein interactions and protein-small molecule (e.g., drug) 

10 interactions can be studied by contacting the compound-protein array 

with a mixture of the molecules of interest. In this case, a compound will 
be used that has no cleavable linkage L, or that has a linkage L that is 
stable under MALDI-TOF MS conditions. Subsequent scanning of the 
array with the mass spectrometer demonstrates that hybridized proteins 

15 of the protein array have effectively interacted with the protein or small 
molecule mixtures of interest. 

Analysis using the well known 2-hybrid methodology is also 
possible and can be detected via mass spectrometry. See, e.g., U.S. 
Patent Nos. 5,512,473, 5,580,721, 5,580,736, 5,955,280, 5,695,941. 

20 See also, Brent eta/. (1996) Nucleic Acids Res. 24f 77,1:3341 -3347. 

In the above embodiments, including those where Z contains a 
cleavable linkage, the compounds can contain a mass modifying tag. In 
these embodiments, the mass modifying tag is used to analyze the 
differences in structure (e.g., side chain modification such as 

25 phosphoylation or dephosphorylation) and/or expression levels of 

biomolecules, including proteins. In one embodiment, two compounds (or 
two sets of compounds having identical permuted B moieties) are used 
that only differ in the presence or absence of a mass modifying tag (or 
have two mass tags with appropriate mass differences). One compound 
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(or one set of compounds) is (are) reacted with "healthy" tissue and the 
mass modified compound(s) are reacted with the "disease" tissue under 
otherwise identical conditions. The two reactions are pooled and 
analyzed in a duplex mode. The mass differences will elucidate those 
5 proteins that are altered structurally or expressed in different quantity in 
the disease tissue. Three or more mass modifying tags can be used in 
separate reactions and pooled for multiplex analysis to follow the 
differences during different stages of disease development (i.e., mass 
modifying tag 1 at time point 1 , mass modifying tag 2 at time point 2 
10 etc.), or, alternatively, to analyze different tissue sections of a disease 
tissue such as a tumor sample. 

Selectivity in the reaction of the compounds provided herein with a 
biomolecule, such as a protein mixture also can be achieved by 
performing the reactions under kinetic control and by withdrawing 
15 aliquots at different time intervals. Alternatively, different parallel 

reactions can be performed (for example, all differing in the B moiety of 
the Q group) and either performed with different stochiometric ratios or 
stopped at different time intervals and analyzed separately. 

In embodiments where the capture compounds provided herein 
20 possess a luminescent or colorimetric group, the immobilized compound- 
biomolecule conjugate can be viewed on the insoluble support prior to 
analysis. Viewing the conjugate provides information about where the 
conjugate has hybridized (such as for subsequent MALDI-TOF mass 
spectrometry analysis). In certain embodiments, with selected reagents 
25 the quantity of a given protein from separate experiments (e.g., healthy 
vs. disease, time point 1 vs. time point 2, etc.) can be determined by 
using dyes that can be spectrophotometrically differentiated. 

In other embodiments, the methods are performed by tagging the 
biomolecules to be analyzed, including but not limited to proteins, with 
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more than one, in one embodiment three to five, of the compounds 
provided herein. Such compounds possess functionality designed to 
target smaller chemical features of the biomolecules rather than a 
macromoiecular feature. See, e.g., Figure 3. Such smaller chemical 
5 features include, but are not limted to, NH 2/ SH, SS {after capping SH, SS 
can be targeted by, e.g., gold), and OH. In one non-limiting example, the 
phenolic OH of tyrosine is selectively captured using a diazo compound, 
such as an aryldiazonium salt. In this embodiment, the reaction can be 
performed in water. For example, a functionalized diazonium salt could 
10 be used where the functionality allows for subsequent capture of a 

compound provided herein, thereby providing a oligonucleotide-labelled 
biomolecule. One such functionalized diazonium salt is: 



15 




A biomolecule modified with this reagent is then labelled with an 
oligonucleotide possessing a diene residue. It is appreciated by those of 
skill in the art that many reagent couples other that dienophile/diene can 
25 be used in these embodiments. In the case of dienophile/diene, the 
reaction of the dienophile with the diene can be performed in the 
presence of many other functional groups, including N- 
hydroxysuccinimido-activated oligonucleotides reacting with an NH 2 
group. Thus, these two labelling specific reactions can be performed in 

30 one reaction. See, e.g., Figure 5. 

Subsequently, the multiply-tagged biomolecules are hybridized on 
an array of antisense oligonucleotides, in one embodiment a chip % 
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containing an array of antisense oligonucleotides. Such multiply-tagged 
biomolecules can be sorted with greater selectivity than singly tagged 
biomolecules. See, e.g.. Figure 4. 

In embodiments where the compounds for use in the methods 
5 provided herein are insoluble or poorly soluble in water or aqueous 

buffers, organic solvents are added to the buffers to improve solubility. 
In one embodiment, the ratio of buffer:organic solvent is such that 
denaturation of the biomolecule does not occur. In another embodiment, 
the organic solvents used include, but are not limited to, acetonitrile, 

10 formamide and pyridine. In another embodiment, the ratio of 

buffer:organic solvent is about 4:1 . To determine if an organic co-solvent 
is needed, the rate of reaction of the compounds provided herein with a 
water-souble amine, such as 5'-aminothymidine, is measured. For 
example, the following reaction is performed is a variety of solvent 

15 mixtures well known to those of skill in the art to determine optimal 
conditions for subsequent biomolecule tagging and analysis: 



20 



30 
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OH 



2. Phenotype analyses 

The collections of capture permit a top down holistic approach to 
25 analysis of the proteome and other biomolecules. As noted, the 
collections and methods of use provide an unbiased way to analyze 
biomolecules, since the methods do not necessarily assess specific 
classes of targets, but rather detect or identify changes in the samples. 
The changes identified include structural changes that are related to the 
30 primary sequences and modifications, including post-translational 

modifications. In addition, since the capture compounds can include a 
solubility function they can be designed for reaction in hydrophobic 
conditions, thereby permitting analysis of membrane-bound and 
membrane-associated molecules, particularly proteins. 
35 Problems with proteome analysis arise from genetic variation that 

is not related to a target phenotype, proteome variation due to 
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differences, such as gender, age, metabolic state, the complex mixtures 
of cells in target tissues and variations from cell cycle stage. Thus, to 
identify or detect changes, such as disease-related changes, among the 
biomolecule components of tissues and cells, homogeneity of the sample 
can be important. To provide homogeneity, cells, with different 
phenotypes, such as diseased versus healthy, from the same individual 
are compared. As a result, differences in patterns of biomolecules can 
be attributed to the differences in the phenotype rather than from 
differences among individuals. Hence, samples can be obtained from a 
single individual and cells with different phenotypes, such as healthy 
versus diseased and responders versus non-responders, are separated. In 
addition, the cells can be synchronized or frozen into a metabolic state to 
further reduce background differences. 

Thus, the collections of capture compounds can be used to identify 
phenotype-specific proteins or modifications thereof or other phenotype- 
specific biomolecules and patterns thereof. This can be achieved by 
comparing biomolecule samples from cells or tissues with one phenotype 
to the equivalent cells to biomolecule samples form cells or tissues with 
another phenotype. Phenotypes in cells from the same individual and cell 
type are compared. In particular, primary cells, primary cell culture and/or 
synchronized cells are compared. The patterns of binding of 
biomolecules from the cells to capture compound members of the 
collection can be identified and used as a signature or profile of a disease 
or healthy state or other phenotypes. The particular bound biomolecule, 
such as protein, proteins also can be identified and new disease- 
associated markers, such as particular proteins or structures thereof can 
be identified. Example 6 provides an exemplary embodiment in which 
cells are separated. See also Figure 19. 
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Phenotypes for comparison include, but are not limited to: 
1) samples from diseased versus healthy cells or tissues to identify 
proteins or other biomolecules associated with disease or that are 

markers for disease; 
5 2) samples from drug responders and non responders {i.e. on 20- 

30% of malignant melanoma patients respond to alpha interferon and 
others to do not) to identify biomolecules indicative of response; 

3) samples from cells or tissues with a toxicity profile to drugs or 
environmental conditions to identify biomolecules associated with the 

10 response or a marker of the response; and 

4) samples from cells or tissues exposed to any condition or 
exhibiting any phenotype in order to identify biomolecules, such as 
proteins, associated with the response or phenotype or that are a marker 
therefor. 

15 Generally the samples for each phenotype are obtained from the 

same organism, such as from the same mammal so that the cells are 
essentially matched and any variation should reflect variation due to the 
phenotype not the source of the cells. Samples can be obtained from 
primary cells (or tissues). In all instances, the samples can be obtained 

20 from the same individual either before exposure or treatment or from 

healthy non-diseased tissue in order to permit identification of phenotype- 
associated biomolecules. 

Cells can be separated by any suitable method that permits 
identification of a particular phenotype and then separation of the cells 

25 based thereon. Any separation method, such as, for example, panning, 
negative panning-where unwanted cells are captured and the wanted cells 
remain in the supernatant) where the live cells are recovered can be used. 
These methods include, but are not limited to: 
1) flow cytometry; 
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2) specific capture; 

3) negative panning in which unwanted cells are captured and the 
targeted cells remain in the supernatant and live cells are recovered for 
analysis; and 

5 4) Laser Capture Microdissection (LCM) (Arcturus, Inc Mountain 

View, CA). 

Thus sorting criteria include, but are not limited to, membrane 
potential, ion flux, enzymatic activity, cell surface markers, disease 
markers, and other such criteria that permit separation of cells from an 
10 individual based on phenotype. 

a) Exemplary separation methods 

1 ) Laser Capture Microdissection 

Laser Capture Microdissection (LCM) (Arcturus, Inc Mountain 
View, CA) uses a microscope platform combined with a low-energy IR 

15 laser to activate a plastic capture film onto selected cells of interest. 
The cells are then gently lifted from the surrounding tissue. This 
approach precludes any absorption of laser radiation by microdissected 
cells or surrounding tissue, thus ensuring the integrity of RNA, DNA, and 
protein prepared from the microdissected samples for downstream 

20 analysis. 

2) Flow cytometry for separation 

Flow cytometry is a method, somewhat analogous to fluorescent 
microscopy, in which measurements are performed on particles (cells) in 
liquid suspension, which flow one at a time through a focused laser beam 
25 at rates up to several thousand particles per second. Light scattered and 
fluorescence emitted by the particles (cells) is collected, filtered, digitized 
and sent to a computer for analysis. Typically flow cytometry measures 
the binding of a fluorochrome-labeled probe to cells and the comparison 
of the resultant fluorescence to the background fluorescence of 
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unstained cells. Cells can be separated using a version of flow 
cytometry, flow sorting, in which the particles (cells) are separated and 
recovered from suspension based upon properties measured in flow. Cells 
that are recovered via flow sorting are viable and can be collected under 
5 sterile conditions. Typically recovered subpopulations that are in excess 
of 99.5% pure (see Figures 19a and 19b). 

Flow cytometry allows cells to be distingused using various 
parameters including physical and/or chemical characteristics associated 
with cells or properties of cell-associated reagents or probes, any of 

10 which are measured by instrument sensors. Separation: Live v. Dead 
Forward and side scatter are used for preliminary identification and 
gating of cell populations. Scatter parameters are used to exclude debris, 
dead cells, and unwanted aggregates. In a peripheral blood or bone 
marrow sample, lymphocyte, monocyte and granulocyte populations can 

15 be defined, and separately gated and analyzed, on the basis of forward 
and side scatter. Cells that are recovered via flow sorting are viable and 
can be collected under sterile conditions. Typically recovered 
subpopulations are in excess of 99.5% pure. 

Common cell sorting experiments usually involve 

20 immunofluorescence assays, i.e., staining of cells with antibodies 

conjugated to fluorescent dyes in order to detect antigens. In addition, 
sorting can be performed using GFP-reporter constructs in order to isolate 
pure populations of cells expressing a given gene/construct. 

a. Fluorescence 

25 Fluorescent parameter measurement permits investigation of cell 

structures and functions based upon direct staining, reactions with 
fluorochrome labeled probes (e.g., antibodies), or expression of 
fluorescent proteins. Fluorescence signals can be measured as single or 
multiple parameters corresponding to different laser excitation and 
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fluorescence emission wavelengths. When different fluorochromes are 
used simultaneously, signal spillover can occur between fluorescence 
channels. This is corrected through compensation. Certain combinations 
of fluorochromes cannot be used simultaneously; those of skill in the art 
5 can identify such combinations. 

b. Immunofluorescence 
Immunofluorescence involves the staining of cells with antibodies 
conjugated to fluorescent dyes such as FITC (fluorescein), PE 
(phycoerythrin), APC (allophycocyanin), and PE-based tandem 

10 conjugates (R670, CyChrome and others.). Cell surface antigens are the 
usual targets of this assay, but antibodies can be directed at antigens or 
cytokines in the cytoplasm as well. 

DNA staining is used primarily for cell cycle profiling, or as one 
method for measuring apoptosis. Propidium iodide (PI), the most 

15 commonly used DNA stain, cannot enter live cells and can therefore be 
used for viability assays. For cell cycle or apoptosis assays using PI, cells 
must first be fixed in order for staining to take place (see protocol). The 
relative quantity of PI-DNA staining corresponds to the proportion of cells 
in G0/G1, S, and G2/M phases, with lesser amounts of staining indicating 

20 apoptotic/necrotic cells. PI staining can be performed simultaneously with 
certain fluorochromes, such as FITC and GFP, in assays to further 
characterize apoptosis or gene expression. 

Gene Expression and Transfection can be measured indirectly by 
using a reporter gene in the construct. Green Fluorescent Protein-type 

25 constructs (EGFP, red and blue fluorescent proteins) and B-galactosidase, 
for example, can be used to quantify populations of those cells 
expressing the gene/construct. Mutants of GFP are now available that 
can be excited at common frequencies, but emit fluorescence at different 
wavelengths. This allows for measurement of co-transfection, as well as 
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simultaneous detection of gene and antibody expression. Appropriate 
negative (background) controls for experiments involving GFP-type 
constructs should be included. Controls include, for example, the same 
cell type, using the gene insert minus the GFP-type construct. 
5 3) Metabolic Studies and other studies 

Annexin-V can be labeled with various fluorochromes in order to identify 
cells in early stages of apoptosis. CFSE binds to cell membranes and is 
equally distributed when cells divide. The number of divisions cells 
undergo in a period of time can then be counted. CFSE can be used in 
10 conjunction with certain fluorochromes for immunofluorescence. Calcium 
flux can be measured using lndo-1 markers. This can be combined with 
immunofluorescent staining. Intercellular conjugation assays can be 
performed using combinations of dyes such as calcein or hydroethidine. 

b) Synchronizing cell cycles 
15 Once sorted or separated cells are obtained they can be cultured, 

and, can be synchronized or frozen into a particular metabolic state. This 
enhances the ability to identify phenotype-specific biomolecules. Such 
cells can be separated by the above methods, including by flow 
cytometry. Further, cells in the same cell cycle, same metabolic state or 
20 other synchronized state can be separated into groups using flow 
cytometry (see, Figure 19c). 

Cell cycles can be synchronized or frozen by a variety of methods, 
including but are not limited to, cell chelation of critical ions, such as by 
removal of magnesium, zinc, manganese, cobal and/or other ions that 
25 perform specific functions by EDTA or otherchelators (see, e.g.. 

EXAMPLES). Other methods include controlling various metabolic or 
biochemical pathways. Figure 1 8 depicts exemplary points of regulation 
of metabolic control mechanisms for cell synchronization. Examples of 
synchronizing or "freezing" Metabolic Control for synchronizing cells. 
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include, but are not limited to, the following: 

1 ) control of gene expression; 

2) regulation of enzyme reactions; 

3) negative control: Feedback inhibition or End product repression 
and enzyme induction are mechanisms of negative control that lead to a 
decrease in the transcription of proteins; 

4) positive control: catabolite repression is considered a form of 
positive control because it affects an increase in transcription of proteins. 

5) Control of individual proteins translation: 

a) oligonucleotides that hybridize to the 5' cap site have 
inhibit protein synthesis by inhibiting the initial interaction between the 
mRNA and the ribosome 40S sub-unit; 

b) oligonucleotides that hybridize to the 5' UTR up to, and 
including, the translation initiation codon inhibit the scanning of the 40S 
(or 30S) subunit or assembly of the full ribosome (80S for eukaryotes or 
70S for bacterial systems); 

5) control of post translational modification: 

6) control of allosteric enzymes, where the active site binds to the 
substrate of the enzyme and converts it to a product. The allosteric site 
is occupied by some small molecule that is not a substrate. If the protein 
is an enzyme, when the allosteric site is occupied, the enzyme is inactive, 
i.e.. the effector molecule decreases the activity of the enzyme. Some 
multicomponent allosteric enzymes have several sites occupied by various 
effector molecules that modulate enzyme activity over a range of 
conditions. 
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3. Analysis of low abundancy proteins 

Important disease-associates markers and targets could be low 
abundancy proteins, that might not be detected by mass spectrometry. 
To ensure detection, a first capture compound display experiment can be 
5 performed. The resulting array of captured proteins is reacted with a 
non-selective dye, such as a fluorescent dye, that will light up or render 
visible more proteins on the array. The dye can provide ae semi- 
quantitative estimate of the amount of a protein. The of different 
proteins detected by the dye can be determined and then compared the 

10 number detected by mass spectrometry analysis. If there are more 

proteins detected using the dye, the experiments can be repeated using a 
higher starting number of cells so that low abundance proteins can be 
detected and identified by the mass spectrometric analysis. 

For example, housekeeping proteins, such as actin and other such 

15 proteins, are present in high abundance and can mask low abundancy 

proteins. Capture compounds or other purification compound selected or 
designed to capture or removethe high abundancy proteins or 
biomolecules from a mixture before using a collection to asssess the 
components of the mixtuer. Once the high abundancy proteins are 

20 removed, low abundancy proteins have an effectivly higher concentration 
and can be detected. These methods, thus, have two steps: a first step 
to capture high abundancy components of biomolecule mixtures, such as 
the actins. For example, a cell lysate can be contacted with capture 
molecules that include a reactivity group such as biotin or other general 

25 reactivity function linked to a sorting group to remove such high 
abundancy proteins, and then use a suitable collection of capture 
compounds to identify lower abundancy compounds remaining in the 
lysate. 

Also, as discussed above, capture compounds can be 
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designed, such as by appropriate selection of W, to interact intact with 
intact organelles before disrupting them in cells that have been gently 
lysed or otherwise treated to permit access to organelles and internal 
membraes. Then the captured organelles can be disrupted, such as on 
5 which can inlcude an artificial membrane, such as lipid bilayer or micelle 
coating, to capture the organelle proteins and other biomolecules in an 
environment that retains their three-dimensional sturcture. These 
captured proteins can be analyzed. This permits the capture compounds 
to interact with the captured proteins and other biomolecules in thier 

10 native tertiary structure. 

4. Monitoring protein conformation as an indicator of disease 
The collections and/or members thereof can be used to detect or 
distinguish specific conformers of proteins. Hence, for example, if a 
particular conformation of a protein is associated with a disease (or 

1 5 healthy state) the collections or members thereof can detect one 

conformer or distinguish conformers based upon a patter of binding to the 
capture compounds in a collection. Thus, the collections and/or members 
thereof cna be used to detect conformationally altered protein diseases 
{or diseases of protein aggregation), where a diseases-associated protein 

20 or polypeptide has a disease-associated conformation. The methods and 
collections provided herein permit detection of a conformer associated 
with a disease to be detected. These diseases include, but are not 
limited to, amyloid diseases and neurodegenerative diseases. Other 
diseases and associated proteins that exhibit two or more different 

25 conformations in which at least one conformation is with disease, include 
those set forth in the following Table: 



WO 03/092581 



PCT7US02/22821 



-137- 



Disease 


Insoluble protein 


Alzheimer's Disease (AD) 


APP, A0, al-antichymotrypsin, tau, non-A£ 
component, presenellin 1, presenellin 2, 


rrion Qiseases, inciuainy uui are not 
limited to, Creutzfeldt-Jakob disease, 
scrapie, bovine spongiform 
encephalopathy 


p r pSc 


amyotrophic lateral sclerosis (ALS) 

* • * 


superoxide dismutase (SOD) and 
neurofilament 


Pick's Disease 


Pick body 


Parkinson's disease 


a-synuclein in Lewy bodies 


Frontotemporal dementia 


tau in fibrils 


Diabetes Type II 


amylin 


Multiple myeloma 


IgGL-chain 


Plasma cell dyscrasias 




Familial amyloidotic polyneuropathy 


Transthyretin 


Medullary carcinoma of thyroid 


Procalcitonin 


Chronic renal failure 


/? 2 -microgobulin 


Congestive heart failure 


Atrial natriuretic factor 


Senile Cardiac and systemic 
amyloidosis 


transthyretin 


1 Chronic inflammation 


Serum Amyloid A 


Atherosclerosis 


ApoAl 


Familial amyloidosis 


Gelsolin 


Huntington's disease 


Huntington 



25 The collections can be contacted with a mixture of the conformers 

and the members that bind or retain each form can be identified, and a 
pattern thus associated with each conformer. Alternatively, those that 
bind to only one conformer, such as the conformer associated with 
disease can be identified, and sub-collections of one or more of such 

30 capture compounds can be used as a diagnostic reagent for the disease. 
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5. Small molecule identification and biomolecule-biomolecule 
interaction investigation 

Biomolecules, such as proteins, are sorted using a covalent or 

noncovalent interaction with immobilized capture compounds. 

5 Collections, such as arrays of capture capture compounds bound to 

biomolecules, such as from cell lysates, then can be used to screen 

libraries or other mixtures of drug candidates or to further screen mixtuers 

of biomolecules to see what binds to the bound biomolecules. The 

capture biomolecule-biomolecule complexes or biomolecule-drug 

10 candidate complexes can be analyzed to identify biochemical pathways 
and also to identify targets with the candidate drug. 

For example, protein-protein or protein-biomolecule interactions are 
exposed to test compounds, typically small molecules, including small 
organic molecules, peptides, peptide mimetics, antisense molecules or 

15 dsRNA, antibodies, fragments of antibodies, recombinant and sythetic 
antibodies and fragments thereof and other such compounds that can 
serve as drug candidates or lead compounds. Bound small molecules are 
identified by mass spectrometry or other analytical methods. 
F- Systems 

20 In further embodiments, the compounds and the methods 

described herein are designed to be placed into an integrated system that 
standardizes and automates the following process steps: 

. • Isolation of biomolecules from a biological source, including 
isolation of the proteins from cell lysates (lysis, enzymatic 
25 digestion, precipitation, washing) 

• Optionally, removal of low molecular weight materials 

• Optionally, aliquoting the biomolecule mixture, such as a 
protein mixture 

• Reaction of the biomolecule mixture, such as a protein 
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mixture, with compounds of different chemical reactivity (X) 
and sequence diversity (B) provided herein; this step can be 
performed in parallel using aliquots of the biomolecule 
mixture 

• Optionally, removal of excess compound 

• Hybridization of the compound-biomolecule conjugate, such 
as a compound-protein conjugate to single stranded 
oligonucleotides or oligonucleotide analogs that are 
complementary to the Q moiety of the compound; the single 
stranded oligonucleotides or oligonucleotide analogs are 
optionally presented in an array format and are optionally 
immobilized on an insoluble support 

• Optionally, subsequent chemical or enzymatic treatment of 

the protein array 

• Analysis of the biomolecule array, including, but not limited 
to, the steps of (i) deposition of matrix, and (ii) spot-by-spot 
MALDI-TOF mass spectrometry using an array mass 
spectrometer (with or without internal, e.g., on-chip 
molecular weight standard for calibration and quantitation). 

The system includes the collections provided herein, optionally 
arrays of such collections, software for control of the processes of 
sample preparations and instrumental analyis and for analysis of the 
resulting data, and instrumentation, such as a mass spectrometer, for 
analysis of the biolmolecules. The system include other devices, such as 
a liquid chromatographic devices so that a protein mixture is at least 
partially separated. The eluent is collected in a continuous series of 
aliquots into, e.g., microtiter plates, and each aliquot reacted with a 
capture compound provided. 

In multiplex reactions, aliquots in each well can simultaneously 
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react with one or more of the capture compounds provided herein that, 
for example each differ in X (/.€>., amino, thiol, lectin specific 
functionality) with each having a specific and differentiating selectivity 
moiety Y and in the Q group. Chromatography can be done in aqueous or 
5 in organic medium. The resulting reaction mixtures are pooled and 
analyzed directly. Alternatively, subsequent secondary reactions or 
molecular interaction studies are performed prior to analysis, including 
mass spectrometric analysis. 

The systems provided herein can contains an assembly line, such 
10 as pipetting robots on xy stages and reagent supply /washing modules are 
linked with a central separation device and a terminal mass spectrometer 
for analysis and data interpretation. The systems can be programmed to 
perform process steps including (see, e.g., FIG. 2), for example: 

1) Cell cultures (or tissue samples) are provided in microtiter 
15 plates (MTPs) with 1, 2...i wells. To each well, solutions are 

added for lysis of cells, thereby liberating the proteins. In 
some embodiments, appropriate washing steps are included, 
as well as addition of enzymes to digest nucleic acids and 
other non-protein components. In further embodiments, 
20 instead of regular MTPs, MTPs with filter plates in the 

bottom of wells are used. Cell debris is removed either by 
filtration or centrifugation. A conditioning solution for the 
appropriate separation process is added and the material 
from each well separately loaded onto the separation device. 
25 2) Separation utilizes different separation principles such as 

charge, molecular sizing, adsorption, ion-exchange, and 
molecular exclusion principles. Depending on the sample 
size, suitable appropriate dimensions are utilized, such as 
microbore high performance liquid chromatography (HPLC). 
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ln certain embodiments, a continuous flow process is used 
and the effluent is continuously aliquotted into MTP 1,2...n. 
3) Reaction with Proteome Reagents. Each MTP in turn is 

transferred to a Proteome Reagent Station harboring 1, 2... 
5 m reagents differing only in the oligonucleotide sequence 

part [i.e., Q) or/and in the chemical nature of the 
functionality reacting with the proteins (i.e., X). If there are 
more than one MTP coming from one tissue sample then 
reagent 1 is added to the same well of the respective MTPs 
10 1 , 2...n, i.e., in well A1 , reagent 2 in well A2, etc. In 

embodiments where the MTPs have 96 wells (i = 1-96), 96 
different Proteome Reagents (i.e., 96 different compounds 
provided herien, m = 1-96) are supplied through 96 different 
nozzles from the Proteome Reagent Station to prevent cross- 

15 contamination. 

4) Pooling: Excess Proteome Reagent is deactivated, aliquots 
from each well belonging to one and the same tissue 
samples are pooled, and the remaining material is stored at 
conditions that preserve the structure (and if necessary 

20 conformation) of the proteins intact, thereby serving as 

master MTPs for subsequent experiments. 

5) Excess Proteome Reagent is removed in the pooled sample 
using, e.g., the biotin/streptavidin system with magnetic 
beads, then the supernatant is concentrated and conditioned 

25 for hybridization. 

6) Transfer to an Oligonucleotide Chip. After a washing step to 
remove non-hybridized and other low molecular weight 
material, a matrix is added. Alternatively, before matrix 
addition, a digestion with, e.g., trypsin or/and chymotrypsin 
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is performed. After washing out the enzyme and the 
digestion products, the matrix is added. 

Transfer of chip to mass spectrometer. In one embodiment, 
MALDi-TOF mass spectrometry is performed. Other mass 
spectrometric configurations suitable for protein analysis also 
can be applied. The mass spectrometer has a xy stage and 
thereby rasters over each position on the spot for analysis. 
The Proteome Reagent can be designed so that most of the 
reagent part (including the part hybridizing with the 
oligonucleotide chip array) is cleaved either before or during 
mass spectrometry and therefore will be detected in the low 
molecular weight area of the spectrum and therefore well be 
well separated from the peptide (in case of enzymatic 
digestion) or protein molecular weight signals in the mass 
spectrum. 

Finally, the molecular weight signals can be processed for 
noise reduction, background subtraction and other such 
processing steps. The data obtained can be archived and 
interpreted. The molecular weight values of the proteins (or 
the peptides obtained after enzymatic digestion) are 
associated with the human DNA sequence information and 
the derived protein sequence information from the protein 
coding regions. An interaction with available databases will 
reveal whether the proteins and their functions are already 
known. If the function is unknown, the protein can be 
expressed from the known DNA sequence in sufficient scale 
using standard methods to elucidate its function and 
subsequent location in a biochemical pathway, where it 
plays its metabolic role in a healthy individual or in the 
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disease pathway for an individual with disease. 
Since the master plates containing aliquots from the different 
proteins within a given tissue sample have been stored and are available, 
ubsequent experiments then can be performed in a now preselected way, 
5 e.g. , the proteins are displayed on the chip surface for protein-protein 
(biomolecule) interaction studies for target validation or/and to study the 
interaction with combinatorial libraries of small molecules for drug 
candidate selection. 
G. Bioinformatics 
10 The raw data generated from the analysis, such as mass 

spectrometry analysis, of the compound-protein species is processed by 
background subtraction, noise reduction, molecular weight calibration and 
peak refinement {e.g., peak integration). The molecular weight values of 
the cleaved proteins or the digestion products are interpreted and 
1 5 compared with existing protein data bases to determine whether the 
protein in question is known, and if so, what modifications are present 
(glycosylated or not glycosylated, phosphorylated or not phosphorylated, 
etc.). The different sets of experiments belonging to one set of 
compounds are composed, compared and interpreted. For example, one 
20 set of experiments uses a set of compounds with one X moiety and 

different Q moieties. This set of experiments provides data for a portion 
of the proteome, since not all proteins in the proteome will react with a 
given X moiety. Superposition of the data from this set of experiments 
with data from other sets of experiments with different X moieties 
25 provides data for the complete proteome. 

Sets of experiments comparing tissues of healthy and disease 
individuals or from different physiological or developmental stages {e.g., 
tumor progression, dependence of drug treatments to monitor result of 
therapy, immune response to virus or bacteria infection) or different 
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tissues areas (e.g., of a tumor) are investigated, and the final data 
archived. 

The following examples are included for illustrative purposes only 
5 and are not intended to limit the scope of the invention- 
Commercial grade solvents and reagents were used without 
purification unless otherwise specified, and were purchased from the 
following vendors: Anhydrous THF {Aldrich), CH 2 CI 2 (Aldrich, Acros, EM 
Science), CHCI 3 (Aldrich, Mallinckrodt), Hexanes (Acros, EM science), 

10 Ethyl acetate (Alrich, Acros), Acetone (Aldrich, EM science), Methyl 
alcohol (Aldrich), Diethyl ether (Fisher scientific). 4-Bromobenzoic acid 
(Aldrich), 2-amino-2-methyl-1-propanol (Acros), 1 ,3-dicyclocarbodnmide 
(Aldrich), N-hydroxysuccinimide (Aldrich), Maleimide (Aldrich), 1-(3- 
dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (Acros), Thionyl 

15 chloride (Aldrich), Pyridine (Aldrich), Magnesium turnings (Acros), 4- 

(Diphenylhydroxymethyl)benzoic acid (Fluka), Sodium ethaoxide (Acros), 
Potassium carbonate, Sodium iodide, Carbon tetrachloride, methyl iodide, 
RED-AI (Aldrich), anhydrous Na 2 S0 4 (Acros), Acetic acid (EM science), 
Sodium hydroxide (Acros), Molecular sieves A° 4 (Aldrich), and Acetyl 

20 chloride (Aldrich). 1 H NMR spectral data were obtained from 500 MHz 
NMR spectra photometer using CDCI 3 as a solvent. Mass spectral data 
were analyzed using the electrospray method. 

EXAMPLE 1 

Examples for IM 1 m -B r N 2 n 
25 a. N 1 and IM 2 as identical tetramers, B as a trimer 

N 1 = N 2 , m = n = 4, i = 3, B = 64 sequence permutations 

GTGC ATG GTGC 
AAG 
ACG 
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AGG 
TTG 
CTG 
GTG 



GGG 

b. N 1 and N 2 as non-identical tetramers, B as a tetramer 

10 N 1 =J= N 2 , m = n = 4, i = 4, B = 256 sequence permutations 

GTCC ATCG CTAC 
AACG 
ACCG 
AGCG 

15 • • • • 



GGGG 

c. N 1 as a heptamer, N 2 as an octamer, B as an octamer 

20 ={= N 2 , m = 7, n = 8, i = 8, B = 65,536 sequence 

permutations. 

GCTGCCC ATTCGTAC GCCTGCCC 



25 



30 



N 1 B i\r 

EXAMPLE 2 
Separation of proteins on a DNA array 

N^-Bi-NVtS^rMtR^Ja-tS^b-L-X-Protein where B is a trimer; 
m==n = 4 f i = 3 / t = b= 1; underlined sequences are N 1 and N 



2 
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10 



15 



20 



30 



CTGG ATG GTGC - S, - M(R 15 ) a - S 2 - L - X - Protein 1 
-CACG TAC CACG 



CTGC AAG GTGC - S, - M(R 15 ) a - S 2 - L - X - Protein 2 
-CACG TTC CACG 



CTGC ACG GTGC - - M(R 15 ) a - S 2 - L - X - Protein 3 
■CACG TGC CACG 



CTGC GGG GTGC - S, - M(R 15 ) a - S 2 - L - X - Protein 64 
•CACG CCC CACG 



EXAMPLE 3 

I. Preparation of protein mixtures from cells or via protein translation 
of a cDNA library prepared cell or tissues 

The protein mixtures can be selectively divided on the physical or 
biochemical separation techniques 



1. 



Preparation of limited complexity protein pools using cell 
culture or tissue 



Proteins can be isolated from cell culture or tissues according to 
methods well known to those of skill in the art. The isolated proteins are 
purified using methods well known to those of skill in the art {e.g., TPAE, 
differential protein precipitation (precipitation by salts, pH, and ionic 
polymers), differential protein crystallization bulk fractionation, 
electrophoresis (PAGE, isoelectric focusing, capillary), and 
chromatography (immunoaffinity, HPLC, LC)). Individual column fractions 
containing protein mixtures of limited complexity are collected for use as 



antigen. 
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2. Preparation of limited complexity protein pools using cDNA 
expression libraries with (Figure 6) 

a. RNA Isolation 

i. isolation of Total RIMA 

5 Cultured cells or tissues are homogenized in a denaturing solution 

containing 4 M guanidine thiocyanate. The homogenate is mixed 
sequentially with 2 M sodium acetate (pH 4), phenol, and finally 
chloroform/isoamyl alcohol or bromochloropropane. The resulting mixture 
is centrifuged, yielding an upper aqueous phase containing total RNA. 
10 Following isopropanol precipitation, the RNA pellet is dissolved in 

denaturing solution (containing 4 M guanidine thiocyanate), precipitated 
with isopropanol, and washed with 75% ethanol. 

ii. Isolation of Cytoplasmic RNA 

Cells are washed with ice-cold phosphate-buffered saline and kept 
15 on ice for all subsequent manipulations. The pellet of harvested cells is 
resuspended in a lysis buffer containing the nonionic detergent Nonidet P- 
40. Lysis of the plasma membranes occurs almost immediately. The 
intact nuclei are removed by a brief micro centrifuge spin, and sodium 
dodecyl sulfate is added to the cytoplasmic supernatant to denature 
20 protein. Protein is digested with protease and removed by extractions 
with phenol/chloroform and chloroform. The cytoplasmic RNA is 
recovered by ethanol precipitation. 

b. mRNA purification 

Messenger RNA is purified from total or cytoplasmic RNA 
25 preparation using standard procedures. Poly(A) + RNA can be separated 
from total RNA by oligo (dT) binding to the Poly(A) tail of the mRNA. 
Total RNA is denatured to expose the Poly (A) (polyadenylated) tails. 
Poly(A)-containing RNA is then bound to magnetic beads coated with 
oligo(dT) and spirited from the total or cytoplasmic RNA through 
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magnetic forces. The mRNA population can be further enriched for the 
presence of full-length molecules through the selection of a 5'-cap 
containing mRNA species. 

c. cDISIA synthesis 
5 Different types of primers can be used to synthesis full length or 

5'-end containing cDNA libraries from the isolated mRNA. 

i. Oligo (dT) primer, which will generate cDNAs 
for all mRNA species (Figure 7) 

An example of the production of an adapted oligo dT primed cDNA 

10 library is provided in Figure 7. 

ii. Functional protein motif specific degenerated 
oligonucleotides these primers will generate a 
limited number of genes belonging to the same 
protein family or of functionally related proteins 

15 (Figure 8) 

An example of the production of an adapted sequence motif 
specific cDNA library is provided in Figure 8. 

iii. Gene specific oligonucleotide will produce cDNA 
for only one mRNA species (Figure 9) 

20 The oligonucleotides used for the cDNA production can contain 

additional sequences, 1) protein tag specific sequences for easier 
purification of the recombinant proteins (6x HIS), 2) restriction enzyme 
sites, 3) modified 5'-end for cDNA purification or DNA construction 
purpose (Figure 10). 

25 The conversion of mRNA into double-stranded cDNA for insertion 

into a vector is carried out in two parts. First, intact mRNA hybridized to 
an oligonucleotide primer, is copied by reverse transcriptase and the 
products isolated by phenol extraction and ethanol precipitation. The 
RNA in the RNA-DNA hybrid is removed with RNase H as £. coli DNA 

30 polymerase I fills in the gaps. The second-strand fragments thus 

produced are ligated by E. coli DNA ligase. Second-strand synthesis is 
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completed, residual RNA degraded, and cDNA made blunt with RNase H, 
RNase A, T4 DNA polymerase, and E. coli DNA ligase. 

d. Adapter ligation 

Adapter molecules can be ligated to both ends of the blunt ended 
5 double stranded cDNA or to only one end of the cDNA. Site directed 
adapter ligation could be acheved through the use of 5' modified 
oligonucleotides (for example biotinylated, aminated) during cDNA 
synthesis that prevents adapter ligation to the 3' end of the cDNA. The 
resulting cDNA molecules contain a 5'-end cDNA library comprised of the 
10 5' non-translated region, the translational start codon AUG coding for a 
methionine, followed by the coding region of the gene or genes. The 
cDNA molecules are flanked by known DNA sequence on their 5'- and 3'- 
end (Figures 14, 15 and 16). 

e. cDNA amplification 

15 PCR Primers to the known 5'- and 3'-end sequences or known 

internal sequences can be synthesized and used for the amplification of 

either the complete library or specific subpopulations of cDNA using 

extended 5'- or 3'- amplification primer in combination with the primer 

located on the opposite site of the cDNA molecules (Figure 1 1). 

20 f . Primer design for the amplification of gene sub- 

populations 

The sub-population primers contain two portions (Figure 12). The 
5'-part of the primer is complementary to the sequence of a known 
sequence, extending with its 3'-end into the unknown cDNA sequence. 
25 Since each nucleotide in the cDNA part of the library can have an 
adenosine, cytidine, guanosine or thymidine residue, 4 different 
nucleotides possibilities exist for each nucleotide position. Four different 
amplification primers can be synthesized, each containing the same 
known sequence and extending by one nucleotide into the cDNA area of 
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the library. The 4 primers only differ at their most 3'-nucleotide, being 
either A, C, G or T. If we suppose that each nucleotide (A, C, G, T) are 
equally represented in a stretch of DNA, each one of the 4 amplification 
primers will amplify one quarter of the total genes represented in the 
5 cDNA library. Extending the amplification primer sequence further and 
increasing the number of amplification primers, the complexity of the 
amplification products can be further reduced. Extending the sequence 
by 2 nucleotides requires the synthesis of 1 6 different primers decreasing 
the complexity by 1 6 fold, 3 nucleotides require 64 different primers and 
10 nucleotide extension requires n 4 different primers. 

g. PCR amplification 
PCR amplification entails mixing template DNA, two appropriate 
oligonucleotide primers (5'- and 3'-end primers located in the known 
added sequences directed in complementary orientation). Tag or other 
15 thermostable DNA polymerases, deoxyribonucleoside triphosphates 

(dNTPs), and a buffer. The PCR products are analyzed after cycling on 
DNA gels or through analysis on an ABI 377 using the genescan analysis 
software. These analysis methods allow the determination of the 
complexity of the amplified cDNA pool. 
20 h. Production of a protein expression library 

Each amplified cDNA library sub-population is cloned 5' to 3' in a 
bacterial (£. coli, etc.) or eukaryotic (Baculovirus, yeast, mammalian) 
protein expression system. The gene s introduced with its own 
translational initiation signal and a 6xHis tag in all 3 frames. For example: 
25 The cDNA is restricted with two different, rare cutting restriction 
enzymes (5'-end Bglll and 3'-end Not I) and cloned in the 5' to 3' 
orientation in the Baculovirus transfer vector P VL1 393 under the direct 
control of the polyhedra promoter. 
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i. Protein expression 
Linearized Baculovirus DNA and recombinant transfer-vector DNA 
are being cotransfected into susceptible SF9 insect cells with calcium 
phosphate. For cotransfection, 10 ug of purified plasmid DNA will be 
5 prepared. An initial recombinant Baculovirus stock will be prepared and 
Sf9 cells are being infected for recombinant protein production. 

j. Protein purification 
The expressed recombinant proteins contain an affinity tag 
(example is a 6xHis tag). They are being purification on Nt-NTA agarose. 
10 Approximately 1 to 2 mg of 6xHis recombinant fusion protein is routinely 
obtained per liter of insect cell culture. 

k. Purification Tag removal 
If the expression vector or the amplification primer was 
constructed with a proteolytic cleavage site for thrombin, the purification 
15 tag can be removed from the recombinant proteins after the protein 
affinity purification step. 

II. Antibody generation by immunization of different animals with 
individual protein mixtures 

3. Preparation of Antibody protein capture reagents 
20 A purified protein preparation translated from a pool of cDNAs is 

injected intramuscularly, intradermal^, or subcutaneously in the presence 
of adjuvant into an animal of the chosen species (rabbit). Booster 
immunizations are started 4 to 8 weeks after the priming immunization 
and continued at 2- to 3-week intervals. The polyclonal antiserum is 
25 being purified using standards known to those skilled in the art. 

The purified antibody batches can be used directly as protein 
capture reagents without modification. In this case the antibody batches 
from different animals have to be kept separate (each batch is one 
capture reagent). 
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III. Antibody proteins are isolated and conjugated with nucleic acid 
sequences that correspond to the original antigen preparation 
resulting in the antibody capture reagents 

Generation of bi-functional capture/sorting molecules for sorting of 
5 the complex protein mixture on a solid phase. 

The glycosylated C H 2 domain of the polyclonal antibodies are 

conjugation to 5' modified oligonucleotides using standard conjugation 

methods. The resulting molecule has one protein capture moiety 

(antibody) and one nucleic acid moiety (oligonucleotide) (Figure 13). 

10 The antibody batches after immunization of an animal with a 

reduced complexity protein pool are conjugated with the one 

oligonucleotide sequence. Antibodies produced from multiple 

immunization events with different protein pools are conjugated to an 

oligonucleotide with a different sequence (Figure 13). 

15 4. Capture of target proteins using reactivity functionality and 

sorting by oligonucleotide hybridization 

Two different methods have been developed for making oligonucle- 
otides bound to a solid support: they can be synthesised in situ, or 
presynthesised and attached to the support. In either case, it is possible 
20 to use the support-bound oligonucleotides in a hybridization reaction with 
oligonucleotides in the liquid phase to form duplexes; the excess of 
oligonucleotide in solution can then be washed away. 

The support can take the form of particles, for example, glass 
spheres, or magnetic beads. In this case the reactions could be carried 
25 out in tubes, or in the wells of a microtitre plate. Methods for 
synthesising oligonucleotides and for attaching presynthesised 
oligonucleotides to these materials are known (see, e.g., Stahl et al. 
(1988) Nucleic Acids Research /6fZ/:3025-3039). 

a. Preparation of amine-functionalized solid support 
30 Oligonucleotides of a defined sequence is synthesized on amine- 
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functionalized a glass support. An amine function was attached discrete 
locations on the glass slide using solution of 700 //I of H 2 N(CH 2 ) 3 
Si{OCH 2 CH 3 ) 3 in 10 ml of 95% ethanol at room temperature for 3 hours. 
The treated support is washed once with methanol and then once with 
5 ethyl ether. The support was dried at room temperature and then baked 
at 110 °C for 15 hours. It was then washed with water, methanol and 

water, and then dried. 

The glass slide was reacted for 30 minutes at room temperature 
with 250 mg (1 millimole) of phthallic anhydride in the presence of 2 ml 
10 of anhydrous pyridine and 61 mg of 4-dimethyl amino pyridine. 

The product was rinsed with methylene dichloride, ethyl alcohol 
and ether, and then dried. The products on the slide were reacted with 
330 mg of dicyclohexylcarbodiimide (DCC) for 30 minutes at room 
temperature. The solution was decanted and replaced with a solution of 
15 1 1 7 mg of 6-amino-1 -hexanol in 2 ml of methylene dichloride and then 
left at room temperature for approximately 8 hours. 

b- Oligonucleotide synthesis on a solid support 
The amine-functionalized solid support was prepared for 
oligonucleotide synthesis by treatment with 400 mg of succinic anhydride 
20 and 244 mg of 4-dimethyl aminopyride in 3 ml of anhydrous pyridine for 
18 hours at room temperature. The solid support treated with 2 ml of 
DMF containing 3 millimoles (330 mg) of DCC and 3 millimoles (420 mg) 
of p-nitrophenol at room temperature overnight. The slide was washed 
with DMF, CH 3 CN, CH 2 CI 2 and ethyl ether. A solution of 2 millimoles 
(234 mg) of H 2 N(CH 2 ) 6 OH in 2 ml of DMF was reacted with slide 
overnight. The product of this reaction was a support, 
-0(CH 2 ) 3 NHCO(CH 2 ) 2 CONH(CH 2 ) 5 CH 2 OH. The slide was washed washed 
with DMF, CH 3 CN, methanol and ethyl ether. 

The functionalized ester resulting from the preparation of the glass 
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support was used for the synthesis of a oligonucleotide sequence. Each 

nucleoside residue was added as a phosphoramidite according to known 

procedures (see, e.g., U.S. Patent Nos. 4,725,677 and 5,198,540, and 

RE34,069, see, also Caruthers eta/. U.S. Patent No. 4,415,732). 

5 5. Protein analysis of the captured proteins and complex protein 

sample comparison 

The purified antibody batches can be either 1) directly attached to 

a solid surface, and incubated with protein samples, 2) incubated with 

the samples and subsequently bound to a solid support without using the 

10 capture compound, 3) the capture compound can be used to capture its 

corresponding protein in a sample and subsequently sort the captured 

proteins through specific nucleotide hybridization (Figure 14). 

IV. Antisense oliogonucleotide capture reagents are immobilized in 
discrete and known locations on a solid surface to create an 

1 5 antibody capture array 

6. Preparation of capture array surface 

5'-aminated oligonucleotides are synthesized using 
phosphoramidate chemistry and attached to N-oxysussinimide esters. 
The attached oligonucleotide sequences are complementary to the sorting 
20 oligonucleotides of the bi-functional antibody molecules (Figure 13). 

Proteins are captured through nucleic acid hybridization of their sorting 
oligonucleotide to the complementary sequence attached to the solid 
surface oligonucleotide. 

V. The antibody capture reagents are added to the total protein 

25 mixture (reactivity step). The reaction mixture is then added to the 

solid surface array under conditions that allow oligonucleotide 
hybridization (sorting step). 

7. Capature compound/protein capture and sorting 

The bi-functional antibodies are being incubated with the protein 
30 sample under conditions that allow the antibodies to bind to their 

corresponding antigen. The bi-functional antibody molecule with the 
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captured protein is added to the oligonucleotide prepared capture array. 
Under standard DNA annealing conditions that do not denature the 
antigen-antibody binding the bi-functional antibody will hybridize with its 
nucleic acid moiety to the complementary oligonucleotide. 
5 VI. The captured protein is identified using MALD! mass spectrometry 

8. Analysis of the capture proteins 

The attached proteins are analyzed using standard protein analysis 
methods, such as mass spectrometry. 

EXAMPLE 4 

10 Synthesis of Trityl based Protein capture compounds (see Figure 15) 

A. Synthesis of 2-(4-bromophenyl)-4,4-dimethyl-1 ,3-oxazoline 
(1) 

4-Bromo benzoic acid (50 g, 0.25M) placed in a 500 m!_ round 
bottom flask fitted with a reflux condenser was added 1 50 mL of thionyl 

15 chloride and for 8 h. The excess of thionyl chloride was removed under 
vacuum and the white solid obtained was dissolved in 100 ml of dry 
CH 2 CI 2 and kept in an ice bath. To this ice cooled solution of bromo 
benzoylchloride was added drop wise 45 g of 2-amino-2-methylpropan-1- 
ol dissolved in another 100 mL of dry CH 2 CI 2 with stirring for the period 

20 of 1 h. The ice bath was removed and the reaction mixture was stirred at 
room temperature for over night. The precipitated white solid was filtered 
and washed several time with CH 2 CI 2 (4x100 mL). The combined CH 2 CI 2 
was removed under rotaevoporator and the solid obtained was slowly 
dissolved in 150 mL of thionyl chloride and refluxed for 3 h. The excess 

25 of SOCI 2 was evaporated to one-six volumes and poured in to 500 mL of 
dry ether cooled in ice bath and kept in the refrigerator overnight. The 
ether was removed and the precipitated hydrochloride was dissolved in 
500 mL of cold water. The aqueous solution was carefully neutralized 
using 20% KOH solution on cold condition (ice bath) and the brown oily 
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residue separated was extracted with CH 2 CI 2 (3x200 mL) and dried over 
anhydrous Na 2 S0 4 . Removal of the solvent gave 42 g (67%) of 2-(4- 
bromophenyl)-4,4-dimethyl-1,3-oxazoline as a yellow oil. 1 H-NMR (500 
MHz, CDCI 3 ) 8 ppm: 1.36 (s, 6H), 4.08 (s, 2H), 7.52 (d, 2H), 7.79 (d, 

5 2H). Mass: 254.3 (M + ). 

B. Synthesis of phenyl-{3-[2-(tetrahydropyran-2-yloxy)-ethoxy]- 
phenyl}-methanone (2) 

1. Method A: In a 100 mL two neck round bottom flask 
placed with 550 mg (8 mM) of NaOEt in 20 mL of dry DMF was added 3- 

10 hydroxy benzophenone (1 g, 5 mM) under argon atmosphere. The 
reaction was stirred at room temperature for 1 0 min and added 2- 
bromoethoxy tetrahydropyran (1 g, 5 mM) dissolved in 5 mL of dry DMF 
by drop wise. The reaction mixture was heated at 60 °C for overnight, 
cooled and poured in to ice water and extracted with CH 2 CI 2 (2x50 mL). 

15 The combined solvent was dried over anhydrous Na 2 S0 4 and evaporated. 
The crude residue obtained was purified by silica gel column 
chromatography using hexane/EtOAc (9:1) mixture as an eluent. Yield : 

680mg (42%). 

2. Method B: To the stirred mixture of 3-hydroxy 

20 benzophenone (1 g, 5 mM), anhydrous K 2 C0 3 (3g, 23 mM) and Nal (500 
mg) in dry acetone (40mL) was added 2-bromoethoxytetrahydropyran 
(1g, 5 mM) dissolved in 10 mL of dry acetone and refluxed for 20 h. The 
precipitate was filtered and was with acetone (3x20 mL). The combined 
filtrate was evaporated and the yellowish residue obtained was purified 

25 by silica gel column chromatography using hexane/EtOAc (9:1) mixture as 
an eluent. Yield: 55- 60%. 1 H-NMR (500 MHz, CDCI 3 ) 6 ppm: 1.5-1.63 
(m, 4H), 1.72 (m, 1H>, 1.82 (m, 1H), 3.52 (m, 1H), 3.8-3.9 (m, 2H), 
4.07 (m, 1H), 4.21 (m, 2H), 4.70(t, 1H), 7.15 (d, 1H), 7.37(m, 3H), 
(7.47 (t, 2H), 7.58(t,1H), 7.80{d,1H). Mass: 327.2<M + ), 349.3 
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(M + Na + >. 

C. Grignard reaction: Synthesis of 2-{4'-(3-{2-tetrahydropyran- 
2-yloxy)ethoxy)phenyl-4"-phenyl)}-4,4-dimethyl-1,3- 
oxazoline, 3 

5 To a 100 mL two necked round-bottomed flask fitted with reflux 

condenser was placed activated Mg turnings (720 mg, 30 mM), a few 
crystals of l 2 and molecular sileves (A4) under argon. To this mixture 10 
ml of THF was added. The mixture was heated to 50 °C and 2-(4- 
bromophenyl)-4,4-dimethyl-1,3-oxazoline (6.5g, 26 mM) dissolved in 1 5 

10 mL of dry THF, a catalytic amount of CH 3 I, RED-AI and CCI 4 were added 
with stirring and refluxed for 3h. After that the reaction mixture was 
cooled to room temperature and added phenyl-{3-[2-(tetrahydropyran-2- 
yloxy)-ethoxy]-phenyl}-methanone (5.1 g, 15.6 mM) dissolved in 1 5 mL 
of dry THF and again refluxed for 3 h, cooled and added 3mL of water. 

15 The solvent was removed under roraevaporator and extracted with CHCI 3 
(3x100 mL) and dried over anhydrous Na 2 S0 4 . The residue obtained on 
removal of the solvent was separated by silica gel column 
chromatography using hexane/EtOAc (7:3) as an eluent. Evaporation of 
the column fraction yielded 2-{4'-(3-(2-tetrahydropyran-2- 

20 yloxy)ethoxy)phenyl-4"-phenyl)}-4 / 4-dimethyl-l,3-oxazoline (3) as a 

yellow crystalline solid (1 .4g, 18%). 1 H-NMR(500 MHz, CDCI 3 ) 5 ppm: 
1.37 (s, 6H), 1.5-1.63 (m, 4H), 1.68 (m, 1H), 1 .80(m, 1H), 2.85 (s, 1 H f 
-OH), 3.49 (m, 1H) r 3.75(m, 1H), 3.85(m, 1H), 3.97 (m, 1H), 4.09(1X1, 
4H), 4.66 (t, 1H), 6.80(d, 1H), 6.84(d, 1H), 6.88(s,1H), 7.1 8-7.31 (m, 

25 6H), 7.34 (d, 2H), 7.87(d f 2H). Mass: 502.6 (M + 1), 524.5 (M + Na + ) 

D. 4,4-Dimethyl-2-[4-(phenyl-[2-(tetrahydro-pyran-2-yloxy)- 
ethoxy]-{3-[2-(tetrahydro-pyran-2-yloxy)-ethoxyJ-phenyl}- 
methyl)-phenyl]-4,5-dihydrooxazole, 4 

To the stirred mixture of 2-{4'-(3-(2-tetrahydropyran-2- 
30 y!oxy)ethoxy)phenyl-4"-phenyl)}-4,4-dimethyl-1 ,3-oxazoline (3, 200 mg, 
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0.4 mM) and NaH (100 mg f 4 mM) in 3 mL of dry DMF at r.t. was added 
2-(2-bromoethoxy)tetrahydro-2H-pyran (500 mg, 2.4 mM) and the 
reaction was allowed to stir at r.t. for 2h. Then the reaction mixture was 
poured in to ice water and extracted with CH 2 CI 2 (3x20 mL) and dried 
5 over anhydrous Na 2 S0 4 . Evaporation of the solvent gave 4 as a yellow 

oily residue in quantitative yield. 

E. 4-{(2-Hydroxy-ethoxy)-[3-(2-hydroxy-ethoxy)-phenyl]-phenyl- 
methyl}-benzoic acid, 5 

A solution of 4 (360 mg) in 3 mL of 80% aqueous acetic acid was 

10 heated at 75 °C for 12h. Then the solution was evaporated and the 

residue obtained was refluxed with 20% NaOH/EtOH (1:1, v/v, 3 mL) for 

2 h. The solvent was removed and added 10 mL of ice cooled water to 

the residue and the aqueous solution was acidified with 1N HCL The 

precipitated yellow solid was filtered and washed several times with 

15 water and dried under high vacuum. Yield: 270 mg (100%, quantitative). 

F. 4-{(2-Hydroxy-ethoxy)-[3-(2-hydroxy-ethoxy)-phenyl]-phenyl- 
methyl}-benzoic acid 2,5-dioxo-pyrrolidin-1-yI ester, 6 

1. Method A: To a stirred solution of trityl acid 5 (110 
mg, 0.26 mM) and N-hydroxy succinimide (80 mg, 0.7 mM) in dry 1,4- 

20 dioxane (2 mL) was added 1-Ethyl-3-(3-dimethylaminopropyl)carbodiimide 
hydrochloride (EDC, 105 mg, .5 mM) dissolved in 2 mL of water. The 
reaction mixture was stirred for 12 h at r.t and the extracted with CHCI3 
(3x10 mL) and dried over anhydrous Na2S04. The solid obtained on 
evaporation of the solvent was purified by preparative TLC plate. Yield 5 

25 mg. 

2. Method B: To a stirred solution of trityl acid 5(12 mg, 
0.03 mM) in dry THF (4 mL) was added dicyclohexyl carbodiimide (DDC, 
10 mg, 0.05 mM). The reaction mixture was stirred for 30 min at r.t and 
added N-hydroxy succinimide (11.5 mg, 0.1 mM) and catalytic amount of 

30 DMAP and allowed to stir for overnight. The solvent was removed under 
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rotaevaporator and the solid obtained was dissolved in dry ether. The 
precipitated DCU was filtered and the solvent ether was evaporated. The 
crude solid obtained was purified by preparative TLC plate. Yield 7 mg 
(50%). 1 H-NMR (500 MHz, CDCI 3 ) 6 ppm : 2.90 (s, 4H), 3.92<t, 4H), 
5 4.02 (t, 4H), 6.83( m, 2H), 7.25 (m, 3H), 7.34 (m, 4H), 7.50(d / 2H), 
8.0(d, 2H). 

G. 4 # 4-Dimethyl-2-[4-(phenyl-(3-phenyl-propoxy)-{3-[2- 
(tetrahydro-pyran-2-yloxy)-ethoxy]-phenyl}-methyl)-phenyl]- 
4,5-dihydro-oxazole, 7 

10 To the stirred mixture of 2-{4 / -(3-(2-tetrahydropyran-2- 

yloxy)ethoxy)phenyl-4' r -phenyl)}-4,4-dimethyl-1 / 3-oxazoline (3, 300 mg, 
0.6 mM) and NaH (100 mg, 4 mM) in 3 mL of dry DMF at r.t was added 
3-bromo-1 -phenyl propane (250mg, 1.2 mM) and the reaction was 
allowed to stir at r.t for 2h. Then the reaction mixture was poured in to 

15 ice water and extracted with CH2CI2 (3x20 mL) and dried over 

anhydrous Na2S04. Evaporation of the solvent gave 7 as a yellow color 

residue in quantitative yield. 

H. 4-[[3-{2-Hydroxy-ethoxy)-phenyl]-phenyl-(3-phenyl-propoxy)- 
methylj-benzoic acid, 8 

20 A solution of 7 (550 mg), in 3 mL of 80% aqueous acetic acid was 

heated at 75 °C for overnight. Then the solution was evaporated and the 
residue obtained was refluxed with 20% NaOH/EtOH (1:1, v/v, 3 mL) for 
2 h. The solvent was removed and added 10 mL of ice cooled water to 
the residue and the aqueous solution was acidified with 1N HCI. And 

25 extracted with CH 2 CI 2 (60 mL) and dried over anhydrous Na 2 S0 4 . 

Evaporation of the solvent gave yellow solid Yield: 485 mg (quantitative). 

I. 4-[[3-(2-Hydroxy-ethoxy)-phenyI]-phenyl-(3-phenyl-propoxy)- 
methyll-benzoic acid 2,5-dioxo-pyrrolidin-1-yl ester, 9 

To a stirred solution of trityl acid 8 (200 mg, 0.42 mM) in dry THF 

30 (6 mL) was added dicyclohexyl carbodiimide (DDC, 206mg, 1 mM). The 
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reaction mixture was stirred for 30 min at r.t and added N-hydroxy 

succinimide (70 mg, 0.6 mM) and catalytic amount of DMAP and allowed 

to stir for overnight. The solvent was removed under rotaevaporator and 

the solid obtained was dissolved in dry ether. The precipitated DCU was 

5 filtered and the solvent ether was evaporated. The crude solid obtained 

was separated by silica column chromatography using CH2CI2. Yield: 

about 120 mg. 1 H-NMR (500 MHz, CDCI 3 ) 6 ppm : 1.70 (m, 2H), 1.9 (t, 

2H), 2.9 (s, 4H), 3.50T1, 2H), 3.9 (t, 2H), 4.0{t, 2H), 6.85( m, 4H>, 7.25 

(m, 4H), 7.32 (m, 5H), 7.51 (m, 3H), 8.09(d, 2H). 

10 J. 1 -{4-[[3-(2-Hydroxy-ethoxy)-phenyl]-phenyl-(3-phenyl- 

propoxy )-methyl]-benzoyl}-pyrrole-2, 5-dione, 1 0 

To a stirred solution of trityl acid 8 (280 mg, 0.42 mM) in dry THF 

(6 mL) was added dicyclohexyl carbodiimide (DDC, 400mg / 1.95mM). 

The reaction mixture was stirred for 30 min at r.t and added maleimide 

15 (100 mg, 1.1 mM) and catalytic amount of DMAP and allowed to stir for 
overnight. The solvent was removed under rotaevaporator and the solid 
obtained was dissolved in dry ether. The precipitated DCU was filtered 
and the solvent ether was evaporated. Part of the product was purified by 
preparative TLC. Yield: 12 mg. 1 H-NMR (500 MHz, CDCI 3 ) 6 ppm : 1.78 

20 (m, 2H), 1.95 (m 2H), 2.9 (s, 4H), 3.51 (m, 2H), 3.93(t, 2H), 4.02(t, 2H), 
6.8( m, 5H), 7.25 (m, 5H), 7.29 (m, 5H), 7.37(m, 3H), 7.48{d, 2H) , 
Mass: 561.3 <M + ). 
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EXAMPLE 5 



This Example shows addition of a selectivity function onto onto a 
capture compound possessing a N-hydroxy succinimdyl ester reactivity 
function. Compounds with sorting can be prepared by using an 
5 appropriate analog of compound 1 1 below. 

Procedure for Mitsunobu Reaction of Trityl Capture Reagents 



1.1 equivalents of triphenylphosphine are added to a reaction vial 
and dissolve in 1 .0 ml THF. 1 .1 equivalents of diisopropyl 
azidodicarboxylate are added to this solution and mixed for 5 minutes. 
10 Add 1 equivalent of 11 and stir for 5 minutes. Add nucleophile (F^ — OH) 
stir overnight at 50 °C. Preparative TLC purified the products. 



Cell synchronization 

H460 lung cancer and SW480 colon cancer cells were synchronized in 
15 Go/G1 with simvastatin and lovastatin (HMG-CoA reductase inhibitors), 
which can enrich a cancer cell population in Go/G1 . Cells arrested in 




11 



12 



EXAMPLE 6 
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G2/M phase were obtained by treatment with nocodazole. 
Cell Culture and Reagents 

The SW480 cell line was cultured in Dulbecco's modified Eagle 
medium (DMEM), the H460 cell line (ATCC Manassas, VA) was cultured 
in RPMI 1640, whereas the FK101 was cultured in serum-free medium 
(SFM) with 5% C0 2 at 37° C. The cell culture media were supplemented 
with 10% fetal bovine serum (FBS), 2mM i-g!utamine, 
penicillin(10OU/ml)and streptomycin(100U/ml). 

Synchronization of Cells 

H460 and SW480 cells enriched in Gj phase were obtained after 
incubation with serum-free medium for 48 hours, or treatment with 
U026, lovastatin or simvastatin. Cells in S phase were synchronized by 
incubating cells with medium containing no serum for 24 hours, followed 
by aphidicolin treatment (2ug/ml) for 20 hours and release of cells from 
aphidicolin for 3 hours. Cells arrested in G2/M phase were obtained by 
treatment with nocodazole (0.4-0.8 m g/ml) for 16-20 hours. 



Synthesis of (4,4'-bisphenyl-hydroxymethyl)benzoyI maleimide derivatives 



EXAMPLE 7 




O 



OH 





i. SOCL,, reflux, lh 



ii. maleimide, dry THF, 2h, r.t. 

iii. R-OH, dry pyridine, r.t 
overnight 




OR 
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5 .O. ^\ 

^ O OMe 



10 



15 




General Procedure: A solution of 4-(diphenylhydroxymethyl)benzoic 
acid (0.04 mM) in 1 mL of SOCI 2 was refluxed for 1 h and removed the 
excess SOCI 2 under high vacuum. To this yellow solid residue obtained 

25 was added maleimide (0.045 mM) dissolved in dry freshly distilled THF (1 
mL) and stirred at room temperature for 2h. The solvent was removed 
and added the corresponding alcohol (ROH, 2-5 fold excess) dissolved in 
dry pyridine (1ml_) with stirring. After the reaction mixture stirred at room 
temperature for overnight the solution was extracted with CH 2 CI 2 

30 (5x3mL) and dried over anhydrous Na 2 S0 4 . The residue obtained on 

evaporation of the solvent was separated by preparative TLC (Silica Gel, 
500 fjm plate) gave the product 1 in 50-60% yield. The trityl derivatives 
1 were fully characterized by 1 H NMR and mass spectral data. 
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EX AMPLE 8 



Succinimidyl Ester Trityl Capture Compound Synth 
Procedure 1 



O 



O^OH 



r 



o 



// \ 



OH + HO-N 



f 

r 



o 




OH 



O 



4-(Diphenylhydroxymethyl) benzoic acid was reacted with 2 
equivalents of N-hydroxysuccinimide using 1 .2 equivalents of Diisopropyl 
carbodiimide. The desired product was purified by Flash Silica 
chromatography and characterized by ESI mass spectrometry. 




The 1 25 //moles of product from above was added to 1 .0 ml 
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10 



Acetyl Chloride. This reaction mixture was stirred at room temperature 
for 1 hour and evaporated three times with toluene to remove excess 
acetyl chloride. Equal volumes of the reaction mixture were added to 
nucleophiles (see below) dissolved in 1.0 M Pyridine/THF. These reaction 
mixtures were mixed at 60°C for 2 hours. The resulting products were 
extracted from CHCI 3 and 10% HOAC. Products were purified by 
Preparative TLC (Ether). MS and NMR characterize purified products. 

O. ^ ^ ,o 



ROH « HO 



HO 




OMe 



O-n-heptyl 



HO 




15 



20 



HO 




HO 




O 

0 n-pentadecyl 
n-pentadecyl 



Procedure 2 




OH 



SOCL 



OH 



HO-N 




ROH 



pyridine 
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1 .64 mmoles of 4-(Diphenylhydroxymethyl) benzoic acid was 
dissolved into 5 ml Thionyl Chloride. This reaction mixture is heated to 
79°C and stirred for 75 minutes. The Thionyl Chloride is removed under 
N 2 (g) stream. 1 .3 equivalents of N-hydroxysuccinimide dissolved in dry 
5 THF is added to this dried reaction mixture and stirred for 1 hour. The 
THF solvent is removed under N 2 (g) stream. The product is dissolved 
into dry Pyridine. Equal volumes of this solution are added to 
nucleophiles dissolved in Pyridine. (S«e below). The resulting products 
are extracted from CHCI 3 and 10% HOAC. Products are purified by 
10 Preparative TLC (Ether). MS and NMR characterize purified products. 

ROH = HO'^ v ^ 0v — ^0 /Nx/ ° N/X OMe 



15 




HO O O-n-hepty! 




HO v O ^ v OH 



20 HO 




O 



HO' Y "O n-pentadecyl 





30 

EXAMPLE 9 

■ 

This example shows exemplary capture binding assays, the effects 
of selectivity functions on binding. This example shows that changing 
35 selectivity can alter reactivity of the capture compound thereby providing 
a means to probe biomolecule structures and to permit sorting or diversity 
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reduction using the collections. In this example, the core group of the 
capture compounds is a trityl group and the reactive group is succinimide, 
which interacts with a primary amine. Compound 1341 is a non- 
selective compound that has a reactivity group, but no selectivity group. 
5 Compound 1 343 (see Figure 20) is exemplary of such compound where 
the selectivity goup is -OH. As the selectivity group changes there is a 
difference in reactivity on the target proteins (lysozyme, cytochrome C 
and ubiquitin). 
Lysozyme 

10 Three different capture compounds (designated HKC 1343, 1349, 

1 365; chemical structure of each compound is listed below the 
Compound name) were reacted individually with Lysozyme (Accession 
number P00698; Figure 20b). The capture experiments were analyzed 
using MALDI-TOF Mass Spectrometry. Binding was performed in 20 uL 

15 sample volumes with a 5 uM Lysozyme concentrations in 25 mM HEPES 
buffer solution, pH 7.0. The trityl-based capture compounds were added 
to the protein solution at a 10 uM concentration. The binding reaction 
was incubated at room temperature for 30 minutes. The reaction was 
quenched using 1 uL of a 100 mM TRIZMA base solution. 

20 The capture compound-protein binding mixture was been prepared 

for mass spectrometry by mixing a 1 uL aliquot of a binding reaction with 
1 uL of a 1 0mg/mL sinapinic acid in 30% aqueous acetonitrile. The 
sample was deposited as a 500 nL spot on the surface of the mass target 
plates and air-dried before mass spectrometric analysis. The results of 

25 the mass spectrometry analysis, which are shown in Figure 20b, 

demonstrate that addition of selectivity groups to compounds permits 
alterations in the binding specificity of capture compounds. 
Cytochrome C 

Four different capture compounds (designated HKC 1341, 1343, 
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1349, 1365; chemical structure of each compound is listed below the 
Compound name) were reacted individually with Cytochrome C 
(accession number: P00006, Figure 20c). The capture experiments were 
analyzed using MALDI-TOF Mass Spectrometry. Binding was performed in 
5 20 uL sample volumes with a 5 uM Cytochrome C concentrations in 25 
mM HEPES buffer solution, pH 7.0. The trityl-based capture compounds 
were added to the protein solution at a 10 uM concentration. The binding 
reaction was incubated at room temperature for 30 minutes. The reaction 
was quenched using 1 uL of a 100 mM TRIZMA base solution. The 

10 capture compound-protein binding mixture was been prepared for mass 
spectrometry analysis by mixing a 1 uL aliquot of the binding reaction 
with 1 uL of a 10mg/mL sinapinic acid in 30% aqueous acetonitrile. The 
sample was deposited as a 500 nL spot on the surface of mass target 
plates and subsequently air-dried before mass spectromteric analyses. 

15 The results of the mass spectrometry analysis, which shown in Figure 
20c, demonstrate that addition of selectivity groups to compounds 
permits alterations in the binding specificity of capture compounds. 
HKC 1343 

One of the exemplary capture compounds (HKC 1343) was n 
20 incubated with a mixture of three different proteins (Ubiquitin, [P02248 ], 
Cytochrome C [P00006] and Lysozyme [P00698]) (see. Figure 20d). The 
capture experiment was analyzed using MALDI-TOF Mass Spectrometry. 
The binding reactions were performed in a 20 uL sample volume with all 
three proteins at 5 uM concentrations in 25 mM HEPES buffer solution 
25 pH 7.0. The trityl-based capture compound was added to the protein 
solution at a 25 uM concentration. The binding reaction was been 
incubated at room temperature for 30 minutes and the reaction quenched 
using 1 uL of a 100 mM TRIZMA base solution. The capture compound- 
protein binding mixture was prepared for mass spectrometry by mixing a 
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1 uL aliquot of the binding reaction with 1 uL of a 10mg/mL sinapinic 
acid in 30% aqueous acetonitrile. The sample was been deposited as a 
500 nL spot on the surface of mass target plates and air-dried before 
mass spectral analysis. The results of the mass spectrometry analysis, 
5 which are shown in Figure 20d, demonstrate that a plurality of 

compounds bound to a single capture agent that is selective can be 
identified by mass spectrometric analysis. 
HKC 1365 

Another of the exemplary capture compounds (HKC 1365) was 

10 incubated with a mixture of three different proteins (Ubiquitin [P02248 ], 
Cytochrome C [P00006] and Lysozyme [P00698]; see Figure 20d). The 
capture experiment was been analyzed using MALDI-TOF Mass 
Spectrometry. The binding reactions were performed in a 20 uL sample 
volume with all three proteins at 5 uM concentrations in 25 mM HEPES 

15 buffer solution pH 7.0. The trityl-based capture compound was added to 
the protein solution at a 15 uM concentration. The binding reaction was 
incubated at room temperature for 30 minutes, and quenched using 1 uL 
of a 100 rnlN/l TRIZMA base solution. The capture compound-protein 
binding mixture was been prepared for mass spectrometry by mixing a 1 

20 uL aliquot of the binding reaction with 1 uL of a 10mg/mL sinapinic acid 
in 30% aqueous acetonitrile. The sample was deposited as a 500 nL 
spot on the surface of the mass target plates and air-dried before mass 
spectral analyses. The results of the mass spectrometry analysis, which 
are shown in Figure 20e, demonstrate that a plurality of compounds 

25 bound to a single capture agent that is selective can be identified by 
mass spectrometric analysis. 

Reaction of cytochrome C with a non-specific compound 

Figure 20f shows mass spectra for a time course reaction of 
cytochrome C with a non-specific compound (HKC 1341). The 
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succinamide reactive group show specificity and reactivity with the 
lysines of cytochrome c. The top spectrum show no modification at time 
0, the middle spectrum show 1-9 modifications resulting from binding of 
HKC1341 after 30 minutes, and the bottom spectrum show, after 24 
5 hours, 17 and 18 modifications, which correspond to the number of 
lysines (18) in cytochrome c. 

EXAMPLE 10 

This examples shows the selectivity of the capture compound reacting a 
mixture of capture compounds and a mixture of proteins 

10 Materials: 

Reaction buffer: 25 mM HEPES, pH 7.0 
Proteins: mixture of ubiquitin, cytochrome c and lysozyme 
(molar ratio is 1/5/6), the protein stock is made as 5 mg/ml (total 
proteins) in reaction buffer. 
15 Capture compounds: HKC 1343 and HKC 1365, stock solution is - 

1 mM in acetonitrile. 
Capturing reaction 

A protein dilution (mixture) is prepared in the reaction buffer at the 
concentration of 0.5, 2.5 and 3 //M, for ubiquitin, cytochrome c and 
20 lysozyme, respectively. 19.5 jj\ is used for one capturing reaction. Each 
reaction is started by adding 0.5 pi of 1 mM compound stock solution 
(final 25 //M). The reaction mixture is incubated at room temperature for 
30 min before the reaction is stopped by the addition of 5 mM TRIZMA. 
Three different reactions are run. The first two tubes contain HKC 
25 1343 and HKC 1365 individually, and a third one is started by adding 
compounds HKC 1 343 and 1 365 (final concentration 25 jjM for each 
compound). After the reaction, 1 jj\ of each sample is mixed with equal 
volume of matrix and subjected to MALDI analysis. Statistic significance 
of the results is ensured by triplicate each reaction sample. 
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Since modifications will be apparent to those of skill in this art, it is 
intended that this invention be limited only by the scope of the appended 
claims. 
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WHAT IS CLAIMED IS: 

1 . A collection of capture compounds, comprising: 

a plurality of capture compounds, comprising sets of capture compounds, 
wherein set of each capture compounds includes a moiety X that is 
5 selected to covalently bind to biomolecules or to bind with sufficiently 
high affinity so that the resulting complexes of biomolecule/capture 
compounds are stable under conditions of mass spectrometry analysis; a 
moiety Y that increases the selectivity of the binding by X such that the 
capture compound binds to fewer biomlecules when the selectivity 
10 moiety is present than in its absence; a moiety Q, such that each set 
contains a different Q, wherein Q permits separation of each set; and a 
moiety Z for presenting X, Y and Q. 

2. The collection of claim 1, wherein the biomolecules are 

proteins. 

15 3. The collection of claim 1 or claim 2, wherein Q permits 

separation by arraying of the capture compounds on a solid support by 

binding to the surface or a molecule thereon. 

4. The collection of claim 1 or claim 2 that includes at least ten 

different capture compounds. 
20 5. The collection of claim 3 that includes at least ten different 

capture compounds. 

6. The collection of claim 1 or claim 2 that includes at least 

fifty different capture compounds. 

7. The collection of claim 3 that includes at least fifty different 

25 capture compounds. 

8. The collection of claim 1 or claim 2 that includes at least 
100 different capture compounds. 

9. The collection of claim 3 that includes at least 100 different 
capture compounds. 
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10. The collection of any of claims 1-9, wherein Q is chemical 
group for arraying at addressable loci on a solid supports. 

11. A solid support, comprising the collection of compounds of 
any of claims 1-10, wherein each set of compounds is arrayed at a single 

5 locus. 

1 2. A solid support, comprising the collection of compounds of 
claim 3, wherein each set of compounds is arrayed at a single locus. 
13. The collection of any of claims 1-10, wherein: 
component capture compounds are selected from the group 
10 consisting of compounds that have the formula(e): 

Q-z-(X) m 



15 



25 



(Y) n 



Q-Z-(X) m and Q-Z-(Y) n ; 

Z is a moiety that is cleavable prior to or during mass spectrometric 

analysis biomolecules bound to the capture compound; 
m is an integer that is 1 to 1 00; and 
20 n in an integer from 1 to 100. 

14. The collection of claim 13, wherein component capture 
compounds are selected from the group consisting of 



Q-Z-X 
l 

Y 



Q-Z-X and Q-Z-Y. 

15. The collection of any of claims 1-10, wherein: 
30 component capture compounds are selected from the group 

consisting of compounds that have the formula(e): 
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Q-Z-(X) 
00„ 



Q-Z-(X) m and Q-Z-(Y) n ; 

Z is a moiety that is not cleavable prior to or during mass 
spectrometric analysis of biomolecules bound to the capture compound; 

m is an integer that is 1 to 100; and 
10 n in an integer from 1 to 100. 

16. The collection of claim 15, wherein: 

component capture compounds are selected from the group 
consisting of compounds that have the formula(e): 



15 Q-Z-(X) m 
00„ 



Q-Z~(X) m and Q-Z-(Y) R ; 
20 m is an integer that is 1 to 1 00; 

n in an integer from 1 to 100; and 

Q is a oligonucleotide or oligonucleotide analog that includes a 
single-stranded portion of sufficient length "j" to form a stable hybrid 
with a base-complementary single stranded nucleic acid molecule or 
25 analog. 

17. The collection of any of claims 1-10, wherein: 
component capture compounds are selected from the group consisting of 



30 Q-Z-(X) m 
(Y) n 



m is an integer that is 1 to 100; and 
35 n in an integer from 1 to 100. 
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18. The collection of any of claims 1-10, wherein: 
component capture compounds are selected from the group 

consisting of compounds that the formula(e): 

5 Q-Z-X 
Y 

Q-Z-X and Q-Z-Y; and 
10 Q is a oligonucleotide or oligonucleotide analog that includes a 

single-stranded portion of sufficient length w j" to form a stable hybrid 
with a base-complementary single stranded nucleic acid molecule or 
analog. 

19. The collection of any of claims 1-10 and 13-18, wherein: 
15 Q is a oligonucleotide or oligonucleotide analog that includes a 

single-stranded portion of sufficient length to form a stable hybrid with a 
base-complementary single stranded nucleic acid molecule or analog. 

20. The collection of any of claims 1-10 and 13-19, wherein Q 

has formula NV B r N V' wherein: 
20 N\ B and N 2 are oligonucleotides or oligonucleotide analogs 

comprising s, t and u members, respectively; 

B is a region of sequence permutations that contains at least two 

bases; and 

sum of s, i and u is at least 5. 
25 21 . The collection of claim 20, wherein the sum of s, i and u is 

about 5 up to about 50. 

* 

22. The collection of claim 20 or claim 21, wherein each 
member of N\ B and N 2 is independently selected from among monomer 
building blocks of deoxyribonucleic acid, ribonucleic acid, protein nucleic 

30 acid and analogs thereof. 

23. The collection of any of claims 1-10 and 13-22, wherein Z is 



WO 03/092581 



PCT/US02/22821 



-176- 

a photocleavable, acid cleavable, alkaline cleavable, oxidatively cleavable, 
or reductively cleavable group. 

24. The collection of any of claims 1-10 and 13-23, wherein Z 
comprises an insoluble support to which each X, Y and Q is linked either 

5 directly or via a linker. 

25. The collection claim 24, wherein the insoluble support is 
selected from the group consisting of a bead, capillary, plate, membrane, 
wafer, comb, pin, a wafer with pits, an array of pits or nanoliter wells 
and a flat surface for receiving or linking samples at discrete loci. 

10 26. The collection claim 24 or claim 25, wherein the support 

comprises silicon, silica gel, glass, nylon, Wang resin, Merrifield resin, 
dextran cross — linked with epichlorohydrin, agarose, cellulose, magnetic 
beads, Dynabeads, a metal surface or a plastic material. 

27. The collection any of claims 24-26, wherein Z comprises 
15 hydrophobic beads comprising polystyrene, polyethylene, polypropylene 

or teflon, or hydrophilic beads comprising cellulose, dextran cross — linked 
with epichlorohydrin, agarose, polyacrylamide, silica gel and controlled 
pore glass. 

28. The collection of any of claims 1-10 and 13-29, wherein the 
20 Z moiety comprises spacer groups S 1 and/or S 2 , and a cleavable linkage, 

where the S 1 and/or S 2 moieties are attached to insoluble support and the 
cleavable linkage is attached to S 2 , if present, otherwise to the insoluble 
support. 

29. The collection of any of claims 1-10, 13-24 and 28, wherein 
25 Z is at least a trivalent moiety and has less than 50 members and is 

selected from straight or branched chain alkylene, straight or branched 
chain alkenylene, straight or branched chain alkynylene, straight or 
branched chain alkylenoxy, straight or branched chain alkylenthio, straight 
or branched chain alkylencarbonyl, straight or branched chain 



WO 03/092581 



PCT/US02/22821 



-177- 



alkylenamino, cycloalkylene, cycloalkenylene, cycloalkynylene, 
cycloalkylenoxy, cycloalkylenthio, cycloalkylencarbonyl, 

cycloalkylenamino, heterocyclylene, arylene, arylenoxy, arylenthio, ^ 
arylencarbonyl, arylenamino, heteroarylene, heteroarylenoxy, 
5 heteroarylenthio, heteroarylencarbonyl, heteroarylenamino, oxy, thio, 
carbonyl, carbonyloxy, ester, amino, amido, phosphino, phosphineoxido, 
phosphoramidato, phosphinamidato, sulfonamide, sulfonyl, sulfoxide 
carbamate, ureido, and combinations thereof, and is unsubstituted or 
substituted with one or more substituents each independently selected 
10 from R 15 ; 

each R 15 is independently a monovalent group selected from 
straight or branched chain alkyl, straight or branched chain alkenyl, 
straight or branched chain alkynyl, cycloalkyl, cycloalkenyl, cycloalkynyl, 
heterocyclyl, straight or branched chain heterocyclylalkyl, straight or 

15 branched chain heterocyclylalkenyi, straight or branched chain 

heterocyclylalkynyl, aryl, straight or branched chain arylalkyl, straight or 
branched chain arylalkenyl, straight or branched chain arylalkynyl, 
heteroaryl, straight or branched chain heteroarylalkyl, straight or branched 
chain heteroarylalkenyl, straight or branched chain heteroarylalkynyl, halo, 

20 straight or branched chain haloalkyl, pseudohalo, azido, cyaho, nitro, 
OR 60 , NR 60 R 61 , COOR 60 , C(0)R 60 , CfOJNR^R 61 , S(0) q R 60 , S(0) q OR 60 , 
S(O) q NR 60 R 61 , NR 60 C{O)R 61 , NR 60 C(O)NR 60 R 61 , NR 60 S(O) q R 60 , SiR^R^R 62 , 
P(R 60 ) 2 , P(O)(R 60 ) 2 , P(OR 60 ) 2/ P(O)(OR 60 ) 2 , P(O)(OR 60 ){R 61 ) and P<0)NR 60 R 61 ; 
q is an integer from 0 to 2; 

25 each R 60 , R 61 and R 62 is independently hydrogen, straight or 

branched chain alkyl, straight or branched chain alkenyl, straight or 
branched chain alkynyl, aryl, straight or branched chain aralkyl, straight or 
branched chain aralkenyl, straight or branched chain aralkynyl, heteroaryl, 
straight or branched chain heteroaralkyl, straight or branched chain 
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heteroaralkenyl, straight or branched chain heteroaralkynyl, heterocyclyl, 
straight or branched chain heterocyclylalkyl, straight or branched chain 
heterocyclylalkenyl or straight or branched chain heteorcyclylalkynyl; 

with the proviso that Z is cleavable prior to or during analysis of 

5 the biomolecule. 

30. The collection of any of claims 1-10, 13-24 and 28, wherein 
Z is at least a trivalent moiety and is selected from straight or branched 
chain alkyl, straight or branched chain alkenyl, straight or branched chain 
alkynyl, -(C(R 15 ) 2 ) d -, -O-, -S-, -(CH 2 ) d -, -<CH 2 ) d O-, -<CH 2 ) d S-, >N(R 15 ), 

10 -(S(0) u K -(S(0) 2 ) w -, >C(0), -<C<0)) W -, -{C(S(0) u )) w -, -(C(0)0) w -, 
-(C(R 15 ) 2 ) d O-, -(C(R 15 ) 2 ) d S(0) u -, -0(C(R 15 ) 2 ) d -, -S(0) u (C<R 15 ) 2 ) d -, 
-(C(R 15 ) 2 ) d O(C(R 15 ) 2 ) d -, -<C(R 15 ) 2 ) d S(0) u (C(R 15 > 2 ) d -, -N(R l5 )(C(R 15 ) 2 ) d -, 
-(C(R 15 ) 2 ) d NR 15 -, -(C(R 15 ) 2 ) d N(R 15 )(C(R 15 ) 2 ) d -, -(S(R 15 )(O u ) w ^ -<C(R 15 ) 2 ) d -, 
-(C{R 1 %) d O(C{R 15 ) 2 ) d -, -(C(R 15 ) 2 ) d (C(0)0) w (C(R 15 ) 2 ) d - / -(C(0)0) w (C(R 15 ) 2 ) d -, 

15 -{C{R 15 ) 2 ) d <C{0)0) w -, -(C<S)(R 15 ) W -, -(C(0)) w (CR 15 2 ) d ^ 

-{CR 15 ) d {C(0)) w (CR 15 ) d -, -<C(R 1 %) d (C(0)) w -, -N(R 15 )(C(R 15 ) 2 ) W -, 
-OC(R 15 ) 2 C(0)-, -0({R 15 ) 2 C(O)N(R 15 )-, -(C(R 15 ) 2 ) W N(R 15 )(C(R 15 ) 2 ) W - / 
-(C(R 15 ) 2 ) W N(R 15 )-, >P(0) v (R 15 ) x , >P(0) u (R 15 ) 3 , >P(0) u (C{R ,5 ) 2 ) d , >Si(R 15 ) 2 
and combinations of any of these groups; 

20 u, v and x are each independently 0 to 5; 

each d is independently an integer from 1 to 20, or 1 to 12, or 1-6, 

or 1 to 3; 

each w is independently an integer selected from 1 to 6, or 1 to 3, 
or 1 to 2; 

25 with the proviso that Z is cleavable prior to or during analysis of 

the biomolecule. 

31 . The collection of any of claims 1-10, 13-24 and 28, wherein 
Z is a trivalent group having any combination a group selected from the 
group consisting of arylene, heteroarylene, cycloalkylene, >C(R 15 ) 2 , 
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-C(R 15 ) = C(R 15 )-, >C = C(R 23 )(R 24 ), >C(R 23 )(R 24 ), -C = C-, -O-, >S(A) 0 , 
>P(D) V (R 15 ), >P(D) V (ER 15 ), >Si(R 15 ) 2 , >N(R 15 >, >N + (R 23 )(R 24 ) and >C(E); 
where u is 0, 1 or 2; v is 0, 1 , 2 or 3; A is -O- or -NR 15 ; D is -S- or -0-; 
and E is -S-, -O- or -NR 15 ; that groups can be combined in any order; 
5 each R 15 is a monovalent group independently selected from the 

group consisting of hydrogen and Y 1 -R 18 ; 

each Y 1 is a divalent group independently having any combination 
of the following groups: a direct link, arylene, heteroarylene, 
cycloalkylene, >C(R 17 ) 2 , -C{R 17 ) = C<R 17 )-, >C = C(R 23 )(R 24 ) / > C(R 23 ){R 24 ), 
10 -CheC-, -O-, >S(A) U , >P(D) V (R 17 ), >P(D) V (ER 17 ), >N{R 17 ) / >N(COR 17 ), 

>N + (R 23 )(R 24 ), >Si(R 17 ) 2 and >C(E); where u is 0, 1 or 2; v is 0, 1, 2 or 
3; A is -O- or -NR 17 ; D is -S- or -O-; and E is -S-, -O- or -NR 17 ; that groups 
can be combined in any order; 

R 17 and R 18 are each independently selected from the group 
15 consisting of hydrogen, halo, pseudohalo, cyano, azido, nitro, 

-SiR 27 R 28 R 25 , alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, aralkyl, 
araikenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 
heteroaralkynyl, heterocyclyl, heterocyclylalkyl, heterocyclylalkenyl, 
heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
20 and -NR 19 R 20 ; 

R 19 and R 20 are each independently selected from hydrogen, alkyl, 
alkenyl, alkynyl, cycloalkyl, aryl, aralkyl, heteroaryl, heteroaralkyl and 
heterocyclyl; 

R 23 and R 24 are selected from (i) or (ii) as follows: 
25 (i) R 23 and R 24 are independently selected from the group consisting 

of hydrogen, alkyl, alkenyl, alkynyl, cycloalkyl, aryl and heteroaryl; or 

(ii) R 23 and R 24 together form alkylene, alkenylene or cycloalkylene; 

R 25 , R 27 and R 28 are each independently a monovalent group 
selected from hydrogen, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, 
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aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 
heteroaralkynyl, heterocyclyl, heterocyclylalkyl, heterocyclylalkenyl, 
heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
and -NR 19 R 20 ; 

R 15 , R 17 , R 18 , R 19 , R 20 , R 23 , R 24 , R 25 , R 27 and R 28 can be substituted 
with one or more substituents each independently selected from Z 2 ; Z 2 is 
selected from alkyl, alkenyl, alkynyl, aryl, cycloalkyl, cycloalkenyl, 
hydroxy, -S(0) h R 35 ; h is 0, 1 or 2, -NR 35 R 36 , -COOR 35 , -COR 35 , 
-CONR 35 R 36 , -OC(0)NR 35 R 36 , -N(R 35 )C{0)R 36 , alkoxy, aryloxy, heteroaryl, 
heterocyclyl, heteroaryloxy, heterocyclyloxy, aralkyl, aralkenyl, aralkynyl, 
heteroaralkyl, heteroaralkenyl, heteroaralkynyl, aralkoxy, heteroaralkoxy, 
alkoxycarbonyl, carbamoyl, thiocarbamoyl, alkoxycarbonyl, carboxyaryl, 
halo, pseudohalo, haloalkyl and carboxamido; 

R 35 and R 36 are each independently selected from among hydrogen, 
halo, pseudohalo, cyano, azido, nitro, trialkylsilyl, dialkylarylsilyl, 
alkyldiarylsilyl, triarylsilyl, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, 
aryl, aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, 
heteroaralkenyl, heteroaralkynyl, heterocyclyl, heterocyclylalkyl, 
heterocyclylalkenyl, heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, 
aralkoxy, heteroaralkoxy, amino, amido, alkylamino, dialkylamino, 
alkylarylamino, diarylamino and arylamino; 

with the proviso that Z is cleavable prior to or during analysis, 
including mass spectrometric analysis of the compound. 

32. The collection of any of claims 1-10, 13-24 and 28-31, 
wherein Z has the formula: 
-(S 1 ) t -M(R ,5 ) a -(S 2 ) b -L-, wherein: 

S 1 and S 2 are spacer moieties; 

t and b are each independently 0 or 1 ; 

a is an integer from 0 to 4; 
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M is a central moiety possessing three or more points of 
attachment; 

each R 15 is a monovalent group independently selected from Y 2 -R 18 ; 

each Y 2 is a divaient group independently having any combination 
5 of the following groups: a direct link, arylene, heteroarylene, 

cycloalkylene, >C(R 17 ) 2 , -C(R 17 ) = C(R 17 )-, > C = C(R 23 )<R 24 ), > C{R 23 )(R 24 ), 
-C = C-, -0-, >S(A) U , >P(D) V (R 17 ), >P(D) V (ER 17 ), >N{R 17 ), >N(COR 17 ), 
>N + (R 23 )(R 24 ), >Si(R 17 ) 2 and >C(E); where u is 0, 1 or 2; v is 0, 1,2 or 
3; A is -O- or -NR 17 ; D is -S- or -O-; and E is -S-, -O- or -NR 17 ; that groups 
10 can be combined in any order; 

R 17 and R 18 are each independently selected from the group 
consisting of hydrogen, halo, pseudohalo, cyano, azido, nitro, 
-SiR 27 R 28 R 25 , alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, aralkyl, 
aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 
15 heteroaralkynyl, heterocyclyl, heterocyclylalkyl, heterocyclylalkenyl, 

heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
and -NR 19 R 20 ; 

R 19 and R 20 are each independently selected from hydrogen, alkyl, 
alkenyl, alkynyl, cycloalkyl, aryl, aralkyl, heteroaryl, heteroaralkyl and 
20 heterocyclyl; 

R 23 and R 24 are selected from (i) or (ii) as follows: 

(i) R 23 and R 24 are independently selected from the group consisting 
of hydrogen, alkyl, alkenyl alkynyl, cycloalkyl, aryl and heteroaryl; or 

(ii) R 23 and R 24 together form alkylene, alkenylene or cycloalkylene; 
25 R 25 , R 27 and R 28 are each independently a monovalent group 

selected from hydrogen, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, aryl, 
aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, heteroaralkenyl, 
heteroaralkynyl, heterocyclyl, heterocyclylalkyl, heterocyclylalkenyl, 
heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, aralkoxy, heteroaralkoxy 
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and -NR 19 R 20 ; 

R 15 , R 17 , R 18 , R 19 , R 20 , R 23 , R 24 , R 25 , R 27 and R 28 can be substituted 
with one or more substituents each independently selected from Z 2 ; Z 2 is 
selected from alkyl, alkenyf, alkynyl, aryl, cycloalkyl, cycloalkenyl, 
5 hydroxy, -S(0) h R 35 ; h is O, 1 or 2, -NR 35 R 36 , -COOR 35 , -COR 35 , 

-CONR 35 R 36 , -0C(O)NR 35 R 36 , -N(R 35 )C(0)R 36 , alkoxy, aryloxy, heteroaryl, 
heterocyclyl, heteroaryloxy, heterocyclyloxy, aralkyl, aralkenyl, aralkynyl, 
heteroaralkyl, heteroaralkenyi, heteroaralkynyl, aralkoxy, heteroaralkoxy, 
alkoxycarbonyl, carbamoyl, thiocarbamoyl, alkoxycarbonyl, carboxyaryl, 

10 halo, pseudohalo, haloalkyl and carboxamido; 

R 35 and R 36 are each independently selected from among hydrogen, 
halo, pseudohalo, cyano, azido, nitro, trialkylsilyl, dialkylarylsilyl, 
alkyldiarylsilyl, triarylsilyl, alkyl, alkenyl, alkynyl, haloalkyl, haloalkoxy, 
aryl, aralkyl, aralkenyl, aralkynyl, heteroaryl, heteroaralkyl, 

15 heteroaralkenyi, heteroaralkynyl, heterocyclyl, heterocyclylalkyl, 
heterocyclylalkenyl, heterocyclylalkynyl, hydroxy, alkoxy, aryloxy, 
aralkoxy, heteroaralkoxy, amino, amido, alkylamino, dialkylamino, 
alkylarylamino, diarylamino and arylamino; and 

L is a group that is cleavable prior to or during mass spectrometric 

20 analysis of the compound. 

33. The collection of claim 32, wherein M is a tetravalent 
alkylene, tetravalent phenylene, tetravalent biphenylene or a tetravalent 
heterobifunctional trityl derivative, and is unsubstituted or is substituted 
with 1 to 4 groups, each independently selected from R 15 . 

25 34. The collection of claim 32 or claim 33, wherein M is at least 

a trivalent group selected from the following groups absent a hydrogen 
atom: -(CH 2 ) r -, -(CH 2 0) r -, -(CH 2 CH 2 -0) r , -(NH-{CH 2 ) r -C( = 0)) s -, 
-(NH-CH<R 52 )-C( = 0)) r , -(0-(CH) r -C( = 0)) s -, 
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5 



10 
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=\/(R ,5 > 





and 




wherein r and s are each independently an integer from 1 to 1 0; R 52 is the 
side chain of a natural a-amino acid; and z is an integer from 1 to 4. 

35. The collection of any of claims 28-34, wherein S 1 and S 2 are 
each independently selected from -{CH 2 ) r -, -(CH 2 0)-, 
-(CH 2 CH 2 -0> r -,-<NH-(CH 2 ) r -C( = 0)) s -, -(NH-CH{R 52 )-C( = 0)) s -, 
-(0-<CH) r -C( = 0)) s -, 

Op, 






and 




wherein r and s are each independently an integer from 1 to 10; R 52 is the 
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side chain of a natural a-amino acid; and y is an integer from 0 to 4. 

36. The collection of any of claims 32-35, wherein L is a 
disulfide moiety, a photocleavable group, an acid cleavable group, an 
alkaline cleavable group, a oxidatively cleavable group, or a reductively 

5 cleavable group. 

37. The collection of any of claims 32-36, wherein L is a trityl 
ether, an ortho nitro substituted aryl group, an o-nitrobenzyl, a phenacyl, 
nitrophenylsulfenyl group. 



10 



38. The collection of any of claims 32-37, wherein: 
L has formula I, II or III as follow: 




50 



(I) 




, (ID 



35 or 
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5 



10 




R 20 is o;-(4,4'-dimethoxytrityloxy)alkyl or a/-hydroxyalkyl; R is 
20 selected from hydrogen, alkyl, aryl, alkoxycarbonyl, aryloxycarbonyl and 
carboxy; 

R 21 is selected from hydrogen, alkyl, aryl, alkoxycarbonyl, 
aryloxycarbonyl and carboxy; 

R 22 is hydrogen; t is 0-3; 
25 R 50 is alkyl, alkoxy, aryl or aryloxy; 

X 20 is hydrogen, alkyl or OR 20 .. 
R 1 is hydrogen; 

R 2 is selected from among a/-hydroxyalkoxy, a/-(4,4'- 
dimethoxytrityloxy)alkoxy, co-hydroxyalkyl and o/-(4,4'- 
30 dimethoxytrityloxy)alkyl, and is unsubstituted or substituted on the alkyl 
or alkoxy chain with one or more alkyl groups; and 
c and e are each independently 0-4. 
39. The collection of any of claims 32-38, wherein: 
L is selected from among -S-S-, -O P( = 0)(OR 51 )-NH-, 
35 p-Me-o-N0 2 -PhCH 2 -, -0-C( = 0)-, and 
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10 



15 



20 




(R ,s ) y 




(R ,5 ) y 



c-o — , 




and 




OH 
OPO 



2- 



30 



R 51 is straight or branched chain alkyl, straight or branched chain 
alkenyl, straight or branched chain alkynyl, aryl, heteroaryl, cycloalkyl, 

35 heterocyclyl, straight or branched chain aralkyl, straight or branched chain 
aralkenyl, straight or branched chain aralkynyl, straight or branched chain 
heteroaralkyl, straight or branched chain heteroaralkenyl, straight or 
branched chain heteroaralkynyl, straight or branched chain cycloalkylalkyi, 
straight or branched chain cycloalkylalkenyl, straight or branched chain 

40 cycloalkylalkynyl, straight or branched chain heterocyclylalkyl, straight or 
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branched chain heterocyclylalkenyl or straight or branched chain 
heterocyclylalkynyl; and 

y is an integer from O to 4. 

40. The collection of any of claims 32-39, wherein R 15 is -H, 
5 -OH, -OR 5 \ -SH, -SR 51 , -NH 2/ -NHR 51 , -N(R 51 ) 2 , -F, -CI, -Br, -I, -S0 3 H, 

-PO" 2 4 , ~CH 3 , -CH 2 CH 3 , -CH(CH 3 ) 2 or -C(CH 3 ) 3 ; where R 51 is straight or 
branched chain alkyl, straight or branched chain alkenyl, straight or 
branched chain alkynyl, aryl, heteroaryl, cycloalkyl, heterocyclyl, straight 
or branched chain aralkyl, straight or branched chain aralkenyl, straight or 

10 branched chain aralkynyl, straight or branched chain heteroaralkyl, 
straight or branched chain heteroaralkenyl, straight or branched chain 
heteroaralkynyl, straight or branched chain cycioalkylalkyl, straight or 
branched chain cycloalkylalkenyl, straight or branched chain 
cycloalkylalkynyl, straight or branched chain heterocyclylalkyl, straight or 

15 branched chain heterocyclylalkenyl or straight or branched chain 
heterocyclylalkynyl. 

41 . The collection of any of claims 1-10 and 13-40, wherein 
each X is selected from the group consisting of an active ester, an active 
halo moiety, an amino acid side chain-specific functional group, a reagent 

20 that binds to active site of an enzyme, a ligand that binds to a receptor, a 
specific peptide that binds to a biomolecule surfaces, a lectin, an 
antibody, an antigen, biotin; streptavidin. 

42. The collection of any of claims 1-10 and 13-41, wherein an 
X is an a-halo ether, an a-halo carbonyl group, maleimido, a metal 

25 complex, an expoxide, an isothiocyanate, or an antibody against 
phosphorylated or glycosylated peptides/proteins. 

43. The collection of any of claims 1-10 and 13-42, wherein X is 
-C( = 0)0-Ph-pN0 2 , -C( = 0)0-C 6 F 5 , -C( = 0)-0-(N-succinimidyl), -0CH 2 -l, 
-0CH 2 -Br, -OCH 2 -CI, -C{0)CH 2 l, -C(0)CH 2 Br or -C(0)CH 2 CI. 
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44. The collection of any of claims 1-10 and 13-43, wherein 
member compounds comprise a mass modifying tag linked to Z. 

45. The collection of any of claims 28-44, wherein member 
compounds comprise a mass modifying tag; and the mass modifying tag 

5 is linked to Z or is S 2 . 

46. The collection of any of claims 32-45, wherein: 
the mass modified Z moiety has the formula: 

-(S 1 ) r M(R 15 ) a -(S 2 ) b -L-T-; and 

T is a mass modifying tag. 
10 47. The collection of any of claims 44-46, wherein the mass 

modifying tag is a divalent group having the formula -X 1 R 10 - and is 
selected from (i)-(vii) as follows: 

(i) X 1 is a divalent group selected from -O-, 

-0-C(0)-(CH 2 ) y -C(0)0-, -NH-C(O)-, -C(0)-NH-, 
15 -NH-C{0)-(CH 2 ) y -C(0>0-, -NH-C(S)-NH-, -0-P(0-alkyl)-0-, 

-0-S0 2 -0-* -0-C(0)-CH 2 -S-, -S-, -NH- and 

O Me 



20 



25 



N 

O Me 



Me 



and 



R 10 is a divalent group selected from 
-{CH 2 CH 2 0) z -CH 2 CH 2 0-, -(CH 2 CH 2 0) z -CH 2 CH 2 0-alkylene, 
alkylene, alkenylene, alkynylene, arylene, heteroarylene, 
-(CH 2 ) r CH r O-, -(CH 2 ) 2 -CH 2 -0-alkylene, 
30 -{CH 2 CH 2 NH) 2 -CH 2 CH 2 NH-, -CH 2 -CH(OH)-CH 2 0-, 

-Si(R 12 )(R 13 )-, -CHF- and -CF 2 -; where y is an integer from 1 
to 20; z is an integer from 0 to 200; R 11 is the side chain of 
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a naturally occurring a-amino acid; and R 12 and R 12 are each 
independently selected from alkyl, aryl and aralkyl; 
(ii) -S-S-; 
(Hi) -S-; 

5 (iv) -(NH-(CH 2 ) y -NH-C{0)-{CH 2 ) v -C{0)) 2 -NH-(CH 2 ) v -NH-C(0>- 

(CH 2 ) y -C(0)0- f where y and z are selected as in (i); 

(v) -(NH-(CH 2 ) y -C(0)) z -NH-(CH 2 ) y -C{0)0-, where y and z 
are selected as in (i) ; 

(vi) -(NH-CHiR'^-C^^-NH-CHfR^J-C^O-, where R 11 and 
10 z are selected as in (i); or 

(vii) -{0-(CH 2 ) y -C(0)) 2 -NH-{CH 2 ) y -C(0)0- / where y and z are 
selected as in (i). 

48. The collection of claim 45, wherein S 2 has the formula 
-X 1 R 10 -, where -X^ 10 - is selected from (i)-(vii) as follows: 
15 (i) X 1 is a divalent group selected from -O-, 

-0-C(0)-(CH 2 ) y -C(0)0-, -NH-C(OK -C(0)-NH-, 
-NH-C{0)-(CH 2 ) y -C(0)0-, -NH-C{S)-NH-, -0-P(0-alkyl)-0-, 
-0-S0 2 -0-, -0-C(0)-CH 2 -S-, -S-, -NH- and 

O Me 



20 



> 

-N 

> 



25 



Me 



and 



S 

° Me 



R 10 is a divalent group selected from 
-(CH 2 CH 2 0) z -CH 2 CH 2 0-, -(CH 2 CH 2 0) z -CH 2 CH 2 6-alkylene, 
30 alkylene, alkenylene, alkynylene, arylene, heteroarylene, 

-(CH 2 ) z -CH 2 -0-, -(CH 2 ) z -CH 2 -0-alkylene, 
-(CH 2 CH 2 NH) 2 -CH 2 CH 2 NH-, -CH 2 -CH(OH)-CH 2 0-, 
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-Si(R 12 )(R 13 )-, -CHF- and -CF 2 -; where y is an integer from 1 
to 20; z is an integer from 0 to 200; R 11 is the side chain of 
a naturally occurring a-amino acid; and R 12 and R 12 are each 
independently selected from alkyl, aryl and aralkyl; 
5 (ii) -S-S-; 

(iii) -S-; 

(iv) -(NH-(CH 2 ) y -NH-C(0)-{CH 2 ) y -C(0)) 2 -NH-(CH 2 ) y -NH-C(0)- 
(CH 2 ) y -C(0)0-, where y and z are selected as in (i); 

(v) -{NH»(CH 2 ) y -C(0)) 2 -NH-(CH 2 ) y -C(0)0-, where y and z 
10 are selected as in (i); 

(vi) -(NH-CH(R 11 )-C(0)) 2 -NH-CH(R 11 )-C(0)0- / where R 11 and 
z are selected as in (i); or 

(vii) -(0-(CH 2 ) y -C(0)) 2 -NH-(CH 2 ) y -C(0)0-, where y and z are 
selected as in (i). 

15 49. The collection of any of claims 1-10 and 13-48, wherein Q 

is an oligonucleotide comprising at least "j" nucleotides; and the 
collection comprises about 1 0 to 4 j compounds of any, wherein, where j 
is the number of bases in the single-stranded portion of the 
oligonucleotide. 

20 50. The collection of claim 49, wherein Z is a moiety that is 

cleavable during mass spectrometric analysis of the compounds. 

51 . The collection of claim 49, wherein Z is a moiety that is not 
cleavable during mass spectrometric analysis of the compounds. 

52. A composition, comprising a collection of any of claims 1-10 
25 and 13-51, hybridized to a plurality of oligonucleotides or analogs thereof 

that comprise oligonucleotides that are complementary to each each Q. 

53. The composition of claim 52, wherein the oligonucleotides or 
analog thereof that are complementary to Q are immobilized on a solid 
support as an array. 
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54. The composition of claim 53 that is an addressable array. 

55. The collection of any of claims 1-10 and 13-54, further 
comprising biomolecules covalently bound to one or more capture 
compounds in the collection. 

5 56. The collection of claim 55, wherein the biomolecules 

comprise proteins. 

57. The solid support of claim 1 1 or claim 12, wherein the array 
is an addressable array. 

58. A method for analysis of biomolecules, comprising: 

10 a) contacting a composition comprising a biomolecule with a 

collection of capture compounds of any of claims 1-10 and 13-56 to form 
capture compound-biomolecule complexes; and 

b) identifying or detecting bound biomolecules. 

59. The method of claim 58, wherein capture compounds in the 
15 collection further comprises a solubility group W that influences the 

solubility properties of the capture compound . 

60. The method of claim 58 or claim 59, wherein the 
biomolecules are proteins. 

61 . The method of any of claims 58-60, wherein: 

20 the capture compounds are in an addressable array; and 

each locus in the array contains a different set of capture 
compounds. 

62. The method of any of claims 58-61, wherein identification 
comprises mass spectrometric analysis of the bound biomolecules. 

25 63. The method of any of claims 58-62, wherein the 

biomolecules are proteins. 

64. The method are claim 62, wherein the biomolecules are 
proteins. 

65. The method of any of claims 58-64, wherein the 
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biomolecules bound to the capture compounds are treated with a 
protease prior to mass spectrometric analysis. 

66. The method of any of claims 58-65, wherein each set of 
compounds in the collection comprises the same X moiety but differs in Y 

5 moiety. 

67. The method of any of claims 58-66, wherein each set of 
compounds in the collection comprises different X, Y and Q moieties. 

68. The method of any of claims 58-67, wherein each set of 
compounds in the collection comprises different X and Y moieties. 

10 69. A method for separating protein conformers, comprising: 

contacting a composition comprising a biomolecule with a 
collection of capture compounds of any of claims 1-10 and 13-56; 
separating the members of the collection; and 
identifying the bound proteins from the mixture, whereby each 

15 conformer has different binding specificity for members of the collection. 

70. The method of any of claims 69, wherein identification is 
effected by mass spectrometry. 

71 . The method of claim 69 or claim 70, wherein at least one 
conformer is associated with a disease. 

20 72. The collection of any of claims 1-10 and 13-56, wherein 

capture compounds comprise: 

a central core Z linked to a reactive functionality X and a 
selectivity functionality Y, whereby a capture compound forms a covalent 
bond with a biomolecule in the mixture or interacts with high stabilty 

25 such that the affinity of binding of the capture compound to the 

biomolecule through the reactive functionality in the presence of the 
selectivity functionality is at least ten-fold greater than in the absence of 
the selectivity functionality. 

73. A method for reducing diversity of complex mixture of 
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biomolecules, comprising: 

contacting the mixture with a collection of capture compounds of 
any of claims 1-10 and 13-56 to form complexes of capture compounds 
with bound biomolecules; and either before, during or after contacting, 
5 separating each set of complexes of capture compounds with 

biomolecules from the other sets. 

74. A method for identification of phenotype-specific 
biomolecules, comprising: 

sorting cells from a single subject according to a 
10 predetermined phenotype to produce at least two separated sets of cells; 

contacting mixtures of biomolecules from each set of sorted 
cells with a collection of capture compounds of any of claims 1-10 and 
13-56; and 

comparing the patterns of biomolecules binding from each 
15 set to identify biomolecules that differ for each set, thereby identifying 
phenotype-specific biomolecules. 

75. The method of claim 74, wherein the cells are synchronized 
or frozen in a metabolic state before sorting and/or after sorting. 

76. The method of claim 74 or claim 75, wherein the 
20 biomolecules comprise proteins. 

77. The method of any of claims 74-76, wherein the bound 
biomolecules are identified by mass spectrometry. 

78. The method of any of claims 74-77, wherein each capture 
compound includes a moiety X that covalently binds to proteins; a 

25 moiety that increases the selectivity of the binding such that the capture 
compound binds to fewer proteins when the selectivity moiety is present 
than in its absence. 

79. The method of any of claims 74-78, wherein each capture 
compound further comprises a moiety Q for arraying of the capture 
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compounds at different loci on a solid support. 

80. The method of any of claims 74-79, wherein capture 
compounds comprise a moiety X that covalently binds to proteins; and a 
moiety Q that permits arraying of the capture compounds on a solid 

5 support by binding to the surface or a molecule thereon. 

81 . The method of any of claims 74-80, wherein the phenotypes 
are diseased and healthy phenotypes. 

82. Thee method of claim 81, wherein the cells are disease 
phenotype is a tumor and the healthy phenotype is non-tumor. 

10 83. The method of any of claims 58-68, wherein: 

each sorting function Q is an oligonucleotide that includes a single- 
stranded portion of length "j" for hybridizing to a complementary 
oligonucleotide, wherein j is at least 5 bases; and the method further 
comprises: 

15 hybridizing the capture compounds to a set of complementary 

oligonucleotides, which attached to a solid support, wherein the 
hybridizing is effected before or after the contacting step, thereby 
immobilizing the capture compounds or capture compound biomolecule 
complexes on the solid support. 

20 84. The method of claim 83, wherein the single-stranded 

oligonucleotides or oligonucleotide analogs that are complementary to the 
Q moiety on the capture compounds are immobilized on a solid support. 

85. The method of claim 83 or claim 84, wherein the single 
stranded oligonucleotides or oligonucleotide analogs that are 

25 complementary to the Q moiety of the comprise an addressable array. 

86. The method of any of claims 58-68 and 83-85, wherein the 
collection comprises at least 60, 200, 500, 1000 or 1500 sets of 
different capture ragents. 

87. The method of any of claims 58-68 and 83-86, wherein the 
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contacting step is performed in an aqueous medium and the biomolecules 
are hydrophilic. 

88. The method of any of claims 58-68 and 83-87, wherein the 
contacting step is performed in a hydrophobic medium and the 

5 biomolecules are hydrophobic. 

89. The method of any of claims 58-68 and 83-88, wherein 
identification or detection is effected by mass spectrometric analysis of 
the biomolecule-capture compound complexes. 

90. The method of claim 89, wherein the mass spectrometric 

10 format is matrix assisted laser desorption ionization-time of flight (MALDI- 

TOF) mass spectrometry. 

91 . The method of any of claims 58-68 and 83-90, wherein the 

biomolecules comprise proteins. 

92. The method of any of claims 58-68 and 83-91, wherein: 
15 the capture compounds comprise a sorting function Q for arraying 

the compounds on a solid support; and 

the method further comprises arraying the capture compounds on a 
solid support before, during or after the contacting step, wherein: 

the resulting biomolecule-capture compound complexes are at 
20 discrete spots on a solid support. 

93. The method of claim 92, wherein mass spectrometric 
analysis of the bound biomolecules, comprises: 

(i) addition of matrix to the biomolecule-capture 

agent complexes; 

25 (vi) spot-by-spot matrix assisted laser desorption 

ionization-time of flight (MALDI-TOF) mass spectrometry. 

94. The method of any of claims 58-68 and 83-93, further 

comprising: 

chemical or enzymatic treatment of the biomolecule-capture 
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compound complexes to remove or cleave portions thereof. 

95. The method of any of claims 62-68 and 83-94, wherein 
mass spectrometric analysis of the bound biomolecules, comprises: 

(i) addition of matrix to the sets of biomolecule- 
5 capture agent complexes; and 

(ii) matrix assisted laser desorption ionization-time 
of flight (MALDI-TOF) mass spectrometry of each set of biomolecule- 
capture agent complexes. 

96. The method of any of claims 58-68 and 83-95, wherein the 
10 composition comprising a biomolecule is a cell lysate. 

97 The method of claim 96, wherein the cells from which the 
lysate is produced are synchronized or frozen in a metabolic state. 

98. A system for analysis of mixtures of biomolecules, 
comprising: 

15 a collection of capture compounds, comprising sets of 

capture compounds, wherein set of each capture compounds includes a 
moiety X, which is different in each set, that is selected to covalently 
bind to biomolecules or to bind with sufficiently high affinity so that 
complexes of biomolecule/capture compounds are stable under conditions 

20 of mass spectrometric analysis; a moiety Y that increases the selectivity 
of the binding such that the capture compound binds to fewer 
biomlecules when the selectivity moiety is present than in its absence; 

a computer programmed with instructions for controlling and 
directing analysis of biomolecules using the collections; 

25 a mass spectrometer; and 

software for analysis of data produced by the mass 
spectrometer. 

99. The system of claim 98 that is an automated system. 

100. The system of claim 98 or claim 99, further comprising a 
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liquid chromatographic device. 

101. A method of processing the mass spectrometric data 
produced by the method of any of claims 62-68 and 83-97, comprising: 

(a) subtracting any background; 
5 (b) reducing noise; 

(c) calibrating molecular weight; and 

(d) refining peaks. 

102. The method of claim 101 , wherein step (d) comprises peak 
integration. 

10 103. The method of claim 101 or claim 102, further comprising: 

(e) comparing the processed data with existing protein 
databases or DNA databases containing open reading frames 
to determine whether the protein is known, and 

(f) if the protein is known, identifying modifications 

15 104. The method of any of claims 101-103, further comprising: 

comparing data from tissues of healthy and diseased individuals,, or 
from different physiological or developmental stages, or from different 
parts of a tissue. 

105. A method of analysis of biomolecules, comprising: 
20 (a) reacting a first mixture of biomolecules with the collection of 

claim 1 to form a mixture of compound-biomolecule complexes, wherein 
the compounds comprising the mixture have a first mass modifying tag; 

(b) % reacting a second mixture of biomolecules with the 
collection of any of claims 1-10 and 13-56 to form a mixture of 

25 compound-biomolecule complexes, wherein the compounds comprising 
the complexes have either (a) no mass modifying tag; or (b) a second 
mass modifying tag; 

(c) pooling the products of steps (i) and (ii) to produce a mixture 
thereof; 
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{d> sorting the compound-biomolecule complexes in the mixture 
of step c) according to Q moiety to produce an array of sorted 
complexes; 

(v) analyzing the complexes at each locus. 
5 106. The collection of any of claims 1-10 and 13-56, wherein 

compounds in the collection comprises Z, which comprises a reagent of a 
luminescence assay or a group that is detected in a colorimetric assay; 
and a sorting group Q that comprises a single-stranded oligonucleotide. 

107. The collection of claim 17, wherein Z is a solid support. 

108. The collectin of claim 17, wherein Z is a particulate support. 

109. A method for analyzing biomolecules, comprising: 

(i) reacting a mixture of biomolecules with the composition of 
claim 106 to form a mixture of compound-biomolecule 
complexes; 

(ii) hybridizing the compound-biomolecule conjugate mixture to 
single stranded oligonucleotides or oligonucleotide analogs 
that are complementary to the Q moiety of the compound- 
biomolecule complexes to form double stranded hybrids and 

(iii) analyzing the double stranded hybridized complexes. 

110. The method of claim 109, wherein the quantity of a 
biomolecule from different experiments is determined by luminescence or 
by detecting the colorometric tag. 

111. The method of claim 109 or claim 1 10, wherein 
spectrophotometrically differentiatable dyes are used in the quantification 

25 analysis. 

112. The method of any of claims 58-68 and 83-97, wherein the 
analysis is orthogonal time of flight (O-TOF) mass spectrometry. 

113. The method of any of claims 58-68 and 83-97, wherein the 
analysis is electrospray (ES) mass spectrometry. 



10 



15 



20 
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114. A method for analyzing biomolecule interactions, comprising: 

a) contacting a mixture of biomolecules with a collection of 
capture compounds of any of claims 1-10 and 13-56, to form a 
compound-biomolecule complexes, wherein: 

5 Z is a moiety that is not cleavable prior to or during mass 

spectrometric analysis of biomolecules bound to the capture 

compound; and 

the complexes are stable to matrix assisted laser desorption 
ionization-time of flight (MALDI-TOF) mass spectrometry 

10 conditions; 

b) contacting the capture compound-biomolecule complexes 
with a mixture containing compounds selected from the group consisting 
of mixtures of biomolecules amd small molecules test compounds, 
wherein compounds in the mixture bind to biomolecules in the 

15 complexes; 

c) before or after step b) immobilizing the capture compounds 
on a solid support via the Q group of each set of capture compounds; 

d) analyzing the bound compounds by mass spectrometry. 

115. The method of claim 114, wherein the small molecule test 
20 compounds candidates drugs and are selected from the group consisting 

of small organic molecules, peptides, peptide mimetics, antisense 
molecules or dsRNA, antibodies, fragments of antibodies and recombinant 
or synthetic antibodies and fragments thereof; and 

the method is a method for identifying candidate drugs that bind to 
25 biomolecules. 

116. The method of claim 1 14 or claim 115, wherein the capture 
compound-biomolecule complexes are contacted in step a) with a mixture 
of biomolecules to identify components of biomolecule complexes or 
biochemical pathways. 
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117. The method of any of claims 114-11 6, wherein the 

biomolecules are proteins. 

118. A collection of capture compounds, comprising: 

a plurality of capture compounds, comprising sets of capture compounds, 
5 wherein set of each capture compounds includes a moiety X, which is 
different in each set, that is selected to covalently bind to biomolecules or 
to bind with sufficiently high affinity so that the resulting complexes of 
biomolecule/capture compounds are stable under conditions of mass 
spectrometry analysis; a moiety Y that increases the selectivity of the 
10 binding by X such that the capture compound binds to fewer biomlecules 
when the selectivity moiety is present than in its absence; and a moiety Z 

for presenting X and Y. 

1 1 9. The collection of any of claims 1 -1 0 and 1 3-56, wherein the 
capture compounds further comprise a solubility group W that influences 

15 the solubility properties of the capture compound. 

120. The collection of any of claims 1-10 and 13-56, wherein the 
selectivity function Y is selected from those set forth in Figure 17 and/or 
the reactivity function X is selected from those set forth in Figure 16. 

121 . A capture compound, comprising a moiety Z that is a 
20 trivalent trityl for presenting functional moieties X, Y and Q, wherein 

moiety X is selected to covalently bind to biomolecules or to bind with 
sufficiently high affinity so that the resulting complexes of 
biomolecule/capture compounds are stable under conditions of mass 
spectrometric analysis; moiety Y increases the selectivity of the binding 
25 by X such that the capture compound binds to fewer biomlecules when Y 
is present than in its absence; and moiety Q permits separation of sets of 

capture compounds. 

122. The capture compound of claim 121, wherein Z has the 

formula: 
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5 



10 




15 123. The capture compound of claim 121 or claim 122, wherein 

X is selected from the groups set forth in Figure 1 6. 

124. The capture compound of any of claims 121-123, wherein Y 
is selected from the groups set forth in Figure 17. 

125. The capture compound of any of claims 121-124, wherein Q 
20 is an oligonucleotide or oligonucleotide analog that includes a single- 
stranded portion of sufficient length "j" to form a stable hybrid with a 
base-complementary single stranded nucleic acid molecule or analog. 
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SMALL MOLECULES 

CH 3 , C 2 H 5 , and longer chain alkyl groups . 

-o 0 —^.o _^- 0 ^ c H -Ala-Ala-Ala-NH-Acetyl 





NATURAL PRODUCTS 






Cholesterol, Steroids, alkaloids, flavonoids, prostaglandin, peptides, EGF, 
rapamycin 

PROTEIN AGONISTS AND ANTAGONISTS 

1 ,1 ,l-Trifluoro-6Z,9Z,12Z,1 5Z-heneicosateraen-2-one, 

trans-4-[3-Methyi-6-(1 -methylethenyI)-2-cycIohexen-1 -y!]-5-pentyl-1 ,3- 
benzenediol, 

Arachidonyl-2'-chloroethylamide / 

(all Z)-N-(2-cycloethyl)-5,8,1 1 ,14-eicosatetraenamide, 

Arachidonylcyclopropylamide / (all Z)-N-(cyclopropyl)-5,8,1 1,14- 
eicosatetraenamide, 

N-(Piperidin-1 -yl)-5-(4-iodophenyl)-1 -(2,4-dichlorophenyl)-4-methyl-1 H-pyrazole- 
3-carboxamide, 

1-(2,4-Dichlorophenyl)-5-(4-iodophenylH--nriethyl-N-4-morpholinyl-1 H-pyrazole-3- 
carboxamide, 

(all Z)-N-(4-Hydroxyphenyl)-5 l 8,1 1 ,14-eicosatetraenamide, 

6-lodo-2-methyl-1-[2-(4-morphonnyl)ethyl]-1H-indol-3-yl](4-methoxyphenyl) 
methanone, 

Arachidonylethanolamide / (all Z)-N-(2-Hydroxyethyl)-5,8,1 1 ,14- 
eicosatetraenamide, 

FIG. 17a 
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Arachidonylethanolamide / (all Z)-N-(2-Hydroxyethyl)-5,8,11,14- 
eicosatetraenamide, 

N-(2-HydroxyethylH5,6,8,9,11.12,14,15-H]-5Z,8Z,11Z,14Z-eicosatetraenamide, 

2-AG / (5Z,8Z,11Z,14Z)-5,8,11,14-Eicosatetraenoic acid, 2-hydroxy-1- 
(hydroxymethyl)ethyl ester, 

(-)-cis-3-[2-Hydroxy-4-(1 , 1 -dimethyiheptyl)phenyI]-trans-4-(3- 
hyd roxypropyl )cyclohexanol , 

Docosatetraenylethanolamide / N-(2-Hydroxyethyl)-7Z,10Z,l3Z,16Z- 
docosatetraenamide, 

(6aR)-trans-3-(1,1-Dimethylheptyl^^^ 
dimethyl-6H-dibenzo[b,d]pyran-9-methanol, 

[6aR-(6aa,9a,10ap)]-3-(1,1-Dimethyl^ 

hydroxy-e.e-dimethyl-eH-dibenzotb^lpyran-iy.S-Hl-g-methanol, 

(2-Methyl-1-propyl-1H-indol-3-yl)-1-naphthalenylmethanone, 

(6aR,10aR)-3-(1 f 1-dimethylbutyl)-6a,7J^ 
d i benzo [b ,d] pyra n , 

Methyl arachidonyl fluorophosphonate / (5Z,8Z,1 1Z,14Z)-5,8,1 1 ,14- 
eicosatetraenyl-methyl ester phosphonofluoridic acid, 

[R-(all-Z)]-N-(2-Hydroxy-1-methylethyl)-5,8,1 1 ,14-eicosatetraenamide, 2- 
[(5Z.8Z.1 1Z,14Z)-Eicosatetraenyloxy]-1 ,3-propanediol, 

N-(bis-3-chloro-4-hydroxybenzyl)-5Z,8Z,11Z,14Z-eicosatetraenamide, 

« 

(9Z)-N-(2-Hydroxyethyl)-9-octadecenamide, 
N-(2~HydroxyethyI)hexadecanamide, 

(5Z,8Z,1 1Z,14Z)-N-(4-Hydroxy-,2-methylphenyl)-5,8 f 1 1 ,14-eicosatetraenamide, 

(RH+)-[2,3-Dihydro-5-methyl-3-(4-morpholinylmethyl)pyrrolo[1 ,2,3-de]-1 ,4- 
benzoxazin-6-yl]-1 -naphthalenylmethanone 
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PEPTIDES AND ANTIBODIES PROBES FOR CENTRAL NERVOUS 
SYSTEM DISEASES 



(5-Amyloid Precursor Protein 
p-Secretase Inhibitor III 
y-Secretase Inhibitor XII 
(±)-Ibuprofen 
(S)-(+)-lbuprofen 

Anti-p-Amyloid (1-43) 
Anti-BACE1, C-Terminal (485-501) 
Anti-Nicastrin, C-Terminal 
Anti-Nicastrin, N-Terminal 
Anti-Reelin 

PEPTIDES FOR ANGIOGENESIS 



MT1-MMP Hemopexin Domain, His-Tag®, Human, Recombinant 
MT2-MMP Hemopexin Domain, His*Tag®, Human, Recombinant 
VEGF Inhibitor 



FIG. 17hhhh 



WO 03/092581 



105/114 



PCT/US02/22821 



Points for Regulation of Various Metabolic Control Mechanisms 
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SEQUENCE LISTING 



<110> HK Pharmaceuticals, Inc. 
Koster, Hubert 
Siddiqi, Suhaib 
Little, Daniel 



<120> Capture Compounds, Collections Thereof 

And Methods For Analyzing The Proteome And Complex 
Compositions 

<130> 24743-2305 

<140> Not Yet Assigned 
<141> Herewith 

<150> 60/306,019 
<151> 2001-07-16 



<150> 60/314,123 
<151> 2001-08-21 



<150> 60/363,433 
<151> 2002-03-11 



<160> 149 

<170> FastSEQ for Windows Version 4.0 

<210> 1 

<211> 39 

<212> PRT 

<213> Homo Sapien 



<400> 1 

Ser Tyr Ser Met Glu His Phe Arg 

1 5 
Arg Arg Pro Val Lys Val Tyr Pro 

20 

Glu Ala Phe Pro Leu Glu Phe 
35 



Trp Gly Lys Pro Val Gly Lys Lys 

10 15 
Asn Gly Ala Glu Asp Glu Ser Ala 
25 " 30 



<210> 2 

<211> 52 

<212> PRT 

<213> Homo Sapien 



<400> 2 

Tyr Arg Gin Ser Met Asn Asn Phe 

1 " 5 
Arg Phe Gly Thr Cys Thr Val Gin 

20 

Phe Thr Asp Lys Asp Lys Asp Asn 

35 40 
Pro Gin Gly Tyr 
50 



Gin Gly Leu Arg Ser Phe Gly Cys 

10 15 
Lys Leu Ala His Gin lie Tyr Gin 
25 30 
Val Ala Pro Arg Ser Lys He Ser 

45 



<210> 3 

<211> 13 

<212> PRT 

<213> Homo Sapien 



<400> 3 



WO 03/092581 
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10 



<210> 4 

<211> 13 

<212> PRT 

<213> Homo Sapien 

<400> 4 

Trp Gly Lys Pro Val Ser Tyr Ser Met Glu His Phe Arg 
1 5 10 



<210> 5 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 5 

Ala Pro Arg Glu Arg Phe Tyr Ser Glu 
1 5 



<210> 6 

<211> 10 

<212> PRT 

<213> Homo Sapien 

<400> 6 

Tyr Gly Gly Phe Leu Arg Lys Tyr Pro Lys 
15 10 



<210> 7 

<211> 14 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 14 

<221> M0D_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 
<400> 7 

Xaa Gly Arg Leu Gly Thr Gin Trp Ala Val Gly His Leu Met 
1 5 10 



<210> 8 

<211> 37 

<212> PRT 

<213> Homo Sapien 

<400> 8 

Lys Cys Asn Thr Ala Thr Cys Ala 

1 5 
Val His Ser Ser Asn Asn Phe Gly 



Thr Asn Arg Leu Ala Asn Phe Leu 

10 15 
Ala He Leu Ser Ser Thr Asn Val 



WO 03/092581 PCT/US02/22821 
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20 25 30 

Gly Ser Asn Thr Tyr 
35 



<210> 9 

<211> 10 

<212> PRT 

<213> Homo Sapien 

<400> 9 

Asp Arg Val Tyr lie His Pro Phe His Leu 
15 10 



<210> 10 

<211> 8 

<212> PRT 

<213> Homo Sapien 

<400> 10 

Asp Arg Val Tyr lie His Pro Phe 
1 5 



<210> 11 

<211> 7 

<212> PRT 

<213> Homo Sapien 

<400> 11 

Arg Val Tyr lie His Pro Phe 
1 5 



<210> 12 

<211> 13 

<212> PRT 

<213> Homo Sapien 

<400> 12 

Asn Arg Pro Arg Leu Ser His Leu Gly Pro Met Pro Phe 
1 5 10 



<210> 


13 


<211> 


29 


<212> 


PRT 


<213> 


Homo Sapien 


<220> 




<221> 


MOD RES 


<222> 


1 


<223> 


Xaa is D-Phe 


<221> 


MOD RES 


<222> 


10 


<223> 


Nle 



<221> MOD RES 



WO 03/092581 
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<222> 26 
<223> Nle 

<400> 13 

Xaa His Leu Leu Arg Glu Val Leu 

1 5 
Ala Gin Glu Ala His Lys Asn Arg 

20 



Glu Xaa Ala Arg Ala Glu Gin Leu 

10 15 
Leu Xaa Glu He He 
25 



<210> 14 

<211> 28 

<212> PRT 

<213> Homo Sapien 

<400> 14 

Ser Leu Arg Arg Ser Ser Cys Phe 

1 ~ 5 

Ala Gin Ser Gly Leu Gly Cys Asn 

20 



Gly Gly Arg Met Asp Arg He Gly 

10 I 5 
Ser Phe Arg Tyr 
25 



<210> 15 

<211> 13 

<212> PRT 

<213> Homo Sapien 

<400> 15 _ iv i t 

Lys Lys Ala Leu Arg Arg Gin Glu Thr Val Asp Ala Leu 
1 5 10 



<210> 16 

<211> 12 

<212> PRT 

<213> Homo Sapien 

<400> 16 ^ „_ 

Tyr Gly Gly Phe Met Arg Arg Val Gly Arg Pro Glu 

1 5 1° 



<210> 17 

<211> 14 

<212> PRT 

<213> Homo Sapien 

Tyr°Gly 7 Gly Phe Met Arg Arg Val Gly Arg Pro Glu Trp Trp 

i s io 



<210> 18 

<211> 12 

<212> PRT 

<213> Homo Sapien 

<400> 18 , 
Tyr Gly Gly Phe Met Arg Arg Val Gly Arg Pro Glu 
^ 5 10 



WO 03/092581 



5/32 
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<210> 19 

<211> 31 

<212> PRT 

<213> Homo Sapien 



<400> 19 

Tyr Gly Gly Phe Met Thr Ser Glu 

1 5 
Leu Phe Lys Asn Ala He He Lys 

20 



Lys Ser Gin Thr Pro Leu Val Thr 

10 15 
Asn Ala Tyr Lys Lys Gly Glu 
25 ^ 30 



<210> 20 

<211> 22 

<212> PRT 

<213> Homo Sapien 



Ala°Glu°Lys Lys Asp Glu Gly Pro Tyr Arg Met Glu His Phe Arg Trp 

Gly Ser Pro Pro Lys Asp 

20 



<210> 21 

<211> 9 

<212> PRT 

<213> Homo Sapien 



<400> 21 

Tyr Gly Gly Phe Leu Arg Lys Tyr Pro 
1 5 



<210> 22 

<211> 43 

<212> PRT 

<213> Homo Sapien 



<400> 22 _ _ , . 

Asp Ala Glu Phe Arg His Ala Ser Gly Tyr Glu Val His His Gin Lys 

1 5 10 15 

Leu Val Phe Phe Ala Glu Asp Val Gly Ser Asn Leu Gly Ala He lie 

20 25 30 

Gly Leu Met Val Gly Gly Val Val He Ala Thr 
35 40 



<210> 23 

<211> 5 

<212> PRT 

<213> Homo Sapien 

<400> 23 

Arg Leu Arg Phe His 
1 5 



<210> 24 
<211> 32 
<212> PRT 



WO 03/092581 



6/32 
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<213> Homo Sapien 

Ser°Pro 4 Lys Met Val Gin Gly Ser Gly Cys Phe Gly Arg Lys Met Asp 

i 5 ^"0 • 

Arg He Ser Ser Ser Ser Gly Leu Gly Cys Lys Val Leu Arg Arg Hxs 

20 



25 30 



<210> 25 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 25 , 

Arg Pro Pro Gly Phe Ser Pro Phe Arg 

1 5 



<210> 26 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 11 

<400> 26 ^ 
Glv Met Asp Ser Leu Ala Phe Ser Gly Gly Leu 
1 5 10 



<210> 27 

<211> 3 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 3 

<400> 27 
Lys His Gly 
1 



<210> 28 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<400> 28 _ , n _ 

Ala Ser Lys Lys Pro Lys Arg Asn lie Lys AXa 

! ' 5 10 



<210> 29 
<211> 10 



WO 03/092581 



7/32 
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<212> PRT 

<213> Homo Sapien 

<220> 

<221> MODJRES 
<222> 4 

<223> Tyros ine-S03H 

<221> MOD_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 
<400> 29 

Xaa Gin Asp Xaa Thr Gly Trp Met Asp Phe 
1 5 10 



<210> 30 

<211> 28 

<212> PRT 

<213> Homo Sapien 

<400> 30 

Ala He Pro He Thr Ser Phe Glu 

1 5 
Asn Glu Arg Met Pro Pro Arg Arg 

20 



Glu Ala Lys Gly Leu Asp Arg He 

10 15 
Asp Ala Met Pro 
25 



<210> 31 

<211> 32 

<212> PRT 

<213> Homo Sapien 

<400> 31 

Cys Gly Asn Leu Ser Thr Cys Met 

1 " 5 
Asn Lys Phe His Thr Phe Pro Gin 

20 



Leu Gly Thr Tyr Thr Gin Asp Phe 

10 15 
Thr Ala He Gly Val Gly Ala Pro 
25 30 



<210> 32 

<211> 27 

<212> PRT 

<213> Homo Sapien 

<400> 32 

Asp Pro Met Ser Ser Thr Tyr He 

1 5 
Thr He Pro Pro Lys Tyr Arg Glu 

20 



Glu Glu Leu Gly Lys Arg Glu Val 

10 15 
Leu Leu Ala 
25 



<210> 33 

<211> 25 

<212> PRT 

<213> Homo Sapien 

<400> 33 , 
Asn Gin Gly Arg His Phe Cys Gly Gly Ala Glu He His Ala Arg Phe 
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15 10 15 

Val Met Thr Ala Ala Ser Cys Phe Asn 

20 25 



<210> 34 

<211> 30 

<212> PRT 

<213> Homo Sapien 

<400> 34 

Asn Pro Met Tyr Asn Ala Val Ser Asn Ala Asp Leu Met Asp Phe Lys 

15 10 15 

Asn Leu Leu Asp His Leu Glu Glu Lys Met Pro Leu Glu Asp 

20 25 30 



<210> 35 

<211> 18 

<212> PRT 

<213> Homo Sapien 

<400> 35 

Cys Asn Leu Ala Val Ala Ala Ala Ser His lie Tyr Gin Asn Gin Phe 

15 10 15 

Val Gin 



<210> 36 

<211> 35 

<212> PRT 

<213> Homo Sapien 

<400> 36 

Lys Trp Lys Val Phe Lys Lys lie Glu Lys Met Gly Arg Asn lie Arg 

15 10 15 

Asn Gly He Val Lys Ala Gly Pro Ala lie Ala Val Leu Gly Glu Ala 

20 25 30 

Lys Ala Leu 
35 



<210> 37 

<211> 16 

<212> PRT 

<213> Homo Sapien 

<400> 37 

Ser Gly Ser Ala Lys Val Ala Phe Ser Ala He Arg Ser Thr Asn His 
1 5 10 15 



<210> 38 

<211> 37 

<212> PRT 

<213> Homo Sapien 

<400> 38 

Ala Cys Asp Thr Ala Thr Cys Val Thr His Arg Leu Ala Gly Leu Leu 
1 5 10 15 



WO 03/092581 



9/32 
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Ser Arg Ser Gly Gly Val Val Lys Asn Asn Phe Val Pro Thr Asn Val 

20 25 30 

Gly Ser Lys Ala Phe 
35 



<210> 39 

<211> 37 

<212> PRT 

<213> Homo Sapien 

<400> 39 

Ala Cys Asn Thr Ala Thr Cys Val 

1 5 
Ser Arg Ser Gly Gly Met Val Lys 

20 

Gly Ser Lys Ala Phe 
35 



Thr His Arg Leu Ala Gly Leu Leu 

10 15 
Ser Asn Phe Val Pro Thr Asn Val 
25 30 



<210> 40 

<211> 17 

<212> PRT 

<213> Homo Sapien 

■ 

<400> 40 _ 

Leu Gin Asn Arg Arg Gly Leu Asp Leu Leu Phe Leu Lys Glu Gly Gly 

1 5 10 15 

Leu 



<210> 41 

<211> 29 

<212> PRT 

<213> Homo Sapien 

<400> 41 

Gin Glu Gly Ala Pro Pro Gin Gin 

1 5 
Cys Arg Asn Phe Phe Trp Lys Thr 

20 



Ser Ala Arg Arg Asp Arg Met Pro 

10 ~ 15 
Phe Ser Ser Cys Lys 
25 



<210> 42 

<211> 2 

<212> PRT 

<213> Homo Sapien 

<400> 42 
Trp Gly 
1 



<210> 43 

<211> 30 

<212> PRT 

<213> Homo Sapien 

<400> 43 ^ 
Ala Cys Tyr Cys Arg lie Pro Ala Cys lie Ala Gly Glu Arg Arg Tyr 



WO 03/092581 
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<210> 44 

<211> 29 

<212> PRT 

<213> Homo Sapien 

<400> 44 

Cys Tyr Cys Arg lie Pro Ala Cys 

Thr Cys lie Tyr Gin Gly Arg Leu 

20 



He Ala Gly Glu Arg Arg Tyr Gly 

10 ' 15 
Trp Ala Phe Cys Cys 
25 



<210> 45 

<211> 33 

<212> PRT 

<213> Homo Sapien 

<400> 45 -, -|- tt * 

Ala Leu Trp Lys Thr Met Leu Lys Lys Leu Gly Thr Met Ala Leu His 

15 10 15 

Ala Gly Lys Ala Ala Leu Gly Ala Ala Ala Asp Thr He Ser Gin Thr 

20 25 30 

Gin 



<210> 46 

<211> 17 

<212> PRT 

<213> Homo Sapien 




Gin 



<210> 47 

<211> 13 

<212> PRT 

<213> Homo Sapien 

<400> 47 

Tyr Gly Gly Phe Leu Arg Arg Gin Phe Lys Val Val Thr 
15 10 



<210> 48 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 



<221> AMIDATION 



WO 03/092581 
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<222> 11 

<221> MOD_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 
<400> 48 

Xaa Pro Ser Lys Asp Ala Phe lie Gly Leu Met 
1 ~ 5 10 



<210> 49 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<400> 49 
Tyr Pro Trp Phe 
1 



<210> 50 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<400> 50 
Tyr Pro Phe Phe 
1 



<210> 51 

<211> 21 

<212> PRT 

<213> Homo Sapien 

<400> 51 

Cys Ser Cys Ser Ser Leu Met Asp 

1 5 
Leu Asp He He Trp 

20 



Lys Glu Cys Val Tyr Phe Cys His 
10 15 



<210> 52 

<211> 39 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 39 

<400> 52 

His Ser Asp Gly Thr Phe Thr Ser 

1 5 
Glu Ala Val Arg Leu Phe He Glu 

20 

Ser Gly Ala Pro Pro Pro Ser 
35 



Asp Leu Ser Lys Gin Met Glu Glu 

10 15 
Trp Leu Lys Asn Gly Gly Pro Ser 
25 * 30 



WO 03/092581 
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<210> 53 

<211> 17 

<212> PRT 

<213> Homo Sapien 



<400> 53 _, T _ n 

Ala Ala Asp Ser Gly Glu Gly Asp Phe Leu Ala Glu Gly Gly Gly Val 
15 10 15 



Arg 



<210> 54 

<211> 15 

<212> PRT 

<213> Homo Sapien 



<400> 54 

Asx Gin Gly Val Asn Asp Asn Glu Glu Gly Phe Phe Ser Ala Arg 
i 5 10 15 



<210> 55 

<211> 8 

<212> PRT 

<213> Homo Sapien 

<400> 55 

Glu lie Leu Asp Val Pro Ser Thr 
1 5 



<210> 56 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<400> 56 
Phe Met Arg Phe 
1 



<210> 57 

<211> 30 

<212> PRT 

<213> Homo Sapien 

<400> 57 

Gly Trp Thr Leu Asn Ser Ala Gly 

1 5 
Gly Asn His Arg Ser Phe Ser Asp 

20 



Tyr Leu Leu Gly Pro His Ala Val 

10 I 5 
Lys Asn Gly Leu Thr Ser 
25 30 



<210> 58 

<211> 20 

<212> PRT 

<213> Homo Sapien 



<220> 



WO 03/092581 
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<221> AMIDATION 
<222> 20 

<400> 58 

Gly Trp Thr Leu Asn Ser Ala Gly 

1 5 
Phe Gly Leu Met 

20 



Tyr Leu Leu Gly Pro Gin Gin Phe 
10 " 15 



<210> 59 

<211> 5 

<212> PRT 

<213> Homo Sapien 

<400> 59 

Arg Leu Arg Phe Asp 
1 5 



<210> 60 

<211> 17 

<212> PRT 

<213> Homo Sapien 

<400> 60 

Glu Gly Pro Trp Leu Glu Glu Glu Glu Glu Ala Tyr Gly Trp Met Asp 

15 10 15 

Phe 



<210> 61 

<211> 27 

<212> PRT 

<213> Homo Sapien 

<400> 61 

Val Pro Leu Pro Ala Gly Gly Gly 

1 5 
Arg Gly Asn His Trp Ala Val Gly 

20 



Thr Val Leu Thr Lys Met Tyr Pro 

10 15 
His Leu Met 
25 



<210> 62 

<211> 28 

<212> PRT 

<213> Homo Sapien 

<400> 62 

Gly Ser Ser Phe Leu Ser Pro Glu 

1 5 
Glu Ser Lys Lys Pro Pro Ala Lys 

20 



His Gin Arg Val Gin Gin Arg Lys 

10 ~ 15 
Leu Gin Pro Arg 
25 



<210> 63 

<211> 42 

<212> PRT 

<213> Homo Sapien 



WO 03/092581 
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Tt£°Ala 3 Glu Gly Thr Phe He Ser Asp Tyr Ser He Ala Met Asp Lys 

1 5 10 " 

He His Gin Gin Asp Phe Val Asn Trp Leu Leu Ala Gin Lys Gly Lys 

20 25 30 

Lys Asn Asp Trp Lys His Asn He Thr Gin 
35 40 



<210> 64 

<211> 29 

<212> PRT 

<213> Homo Sapien 



His°Ser 4 Gln Gly Thr Phe Thr Ser Asp Tyr Ser Lys Tyr Leu Asp Ser 

1 5 10 I 5 

Arg Arg Ala Gin Asp Phe Val Asp Trp Leu Met Asn Thr 

20 ~ 25 



<210> 65 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 65 

Arg Arg Phe Ala Cys Asp Pro Asp 

1 ~ 5 
Val Pro Gly Gly 

20 



Gly Tyr Asp Asn Tyr Phe His Cys 
10 1 5 



<210> 66 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 66 

Thr Gly Ser Trp Cys Gly Leu Met 

1 " 5 
Asn Thr Gin Gly 

20 



His Tyr Asp Asn Ala Trp Leu Cys 
10 " 15 



<210> 67 

<211> 20 

<212> PRT 

<213> Homo Sapien 




Trp Thr Gin Gly 

20 



<210> 68 
<211> 20 
<212> PRT 
<213> Homo 



Sapien 



WO 03/092581 
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<400> 68 

Arg Ser Thr Leu Cys Trp Phe Glu Gly Tyr Asp Asn Thr Phe Pro Cys 

1 5 10 15 

Lys Tyr Phe Arg 

20 



<210> 69 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 69 

Arg Val Gin Glu Cys Lys Tyr Leu 

1 5 
Lys Asp Asp Gly 

20 



Tyr Tyr Asp Asn Asp Tyr Leu Cys 
10 15 



<210> 70 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 70 

Gly Leu Arg Arg Cys Leu Tyr Gly Pro Tyr Asp Asn Ala Trp Val Cys 

1 " 5 10 15 

Asn lie His Glu 

20 



<210> 71 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 71 

Lys Leu Phe Trp Cys Thr Tyr Glu 

1 5 
Pro Gly Tyr Ser 

20 



Asp Tyr Ala Asn Glu Trp Pro Cys 
10 15 



<210> 72 

<211> 20 

<212> PRT 

<213> Homo Sapien 

<400> 72 

Phe Cys Ala Val Cys Asn Glu Glu Leu Tyr Glu Asn Cys Gly Gly Cys 

15 10 15 

Ser Cys Gly Lys 

20 



<210> 73 

<211> 20 

<212> PRT 

<213> Homo Sapien 



<400> 73 



WO 03/092581 
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Arg Thr Ser Pro Cys Gly Tyr lie Gly Tyr Asp Asn He Phe Glu Cys 

- 10 15 



1 5 
Thr Tyr lieu Gly 

20 



<210> 74 
<211> 20 
<212> PRT 
<213> Homo 



Sapien 



<400> 74 

Thr Gly Glu Trp Cys Ala Gin Ser 

1 5 
Lys Ser Ala Trp 

20 



Val Tyr Ala Asn Tyr Asp Asn Cys 
10 15 



<210> 75 
<211> 20 
<212> PRT 
<213> Homo 



Sapien 



<400> 75 

Asn Val Ser Arg Cys Thr Tyr He 

1 5 
Gly Val Glu Val 

20 



His Tyr Asp Asn Trp Ser Leu Cys 
10 * 15 



<210> 76 

<211> 20 

<212> PRT 

<213> Homo Sapien 




Ser Asp Tyr Ser 

20 



<210> 77 

<211> 44 

<212> PRT 

<213> Homo Sapien 



Ttr°Ala ? Asp Ala He Phe Thr Asn Ser Tyr Arg Lys Val Leu Gly Gin 

1 5 10 15 

Leu Ser Ala Arg Lys Leu Leu Gin Asp He Met Ser Arg Gin Gin Gly 

20 " 25 30 

Glu Ser Asn Gin Glu Arg Gly Ala Arg Ala Arg Leu 
35 40 



<210> 78 

<211> 15 

<212> PRT 

<213> Homo Sapien 



WO 03/092581 
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<400> 78 , _ ^ 

Pro Gly Thr Cys Glu lie Cys Ala Tyr Ala Ala Cys Thr Gly Cys 
1 5 10 15 



<210> 79 

<211> 35 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 35 

<400> 79 _ 

His Ser Asp Ala lie Phe Thr Glu Glu Tyr Ser Lys Leu Leu Ala Lys 

15 10 15 

Leu Ala Leu Gin Lys Tyr Leu Ala Ser He Leu Gly Ser Arg Thr Ser 

20 25 30 

Pro Pro Pro 
35 



<210> 80 

<211> 38 

<212> PRT 

<213> Homo Sapien 

<400> 80 „ 

His Ser Asp Ala Thr Phe Thr Ala Glu Tyr Ser Lys Leu Leu Ala Lys 

15 10 15 

Leu Ala Leu Gin Lys Tyr Leu Glu Ser He Leu Gly Ser Ser Thr Ser 

20 ~ 25 30 

Pro Arg Pro Pro Ser Ser 
35 



<210> 81 

<211> 37 

<212> PRT 

<213> Homo Sapien 

<400> 81 

His Ser Asp Ala Thr Phe Thr Ala Glu Tyr Ser Lys Leu Leu Ala Lys 

15 10 15 

Leu Ala Leu Gin Lys Tyr Leu Glu Ser He Leu Gly Ser Ser Thr Ser 

20 " 25 30 

Pro Arg Pro Pro Ser 
35 



<210> 82 

<211> 24 

<212> PRT 

<213> Homo Sapien 

<400> 82 

Asp Ser His Ala Lys Arg His His Gly Tyr Lys Arg Lys Phe Hxs Glu 

15 10 15 

Lys His His Ser His Arg Gly Tyr 



WO 03/092581 
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20 



<210> 83 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> ACETYLAT I ON 
<222> 1 

<221> MOD_RES 

<223> Xaa is Aspartic acid- f luroacetylmethylket one 

<400> 83 
Tyr Val Ala Xaa 
1 



<210> 84 

<211> 6 

<212> PRT 

<213> Homo Sapien 

<400> 84 

Val Glu Pro He Pro Tyr 
1 5 



<210> 85 

<211> 21 

<212> PRT 

<213> Homo Sapien 

<400> 85 

Gly He Val Glu Gin Cys Cys Thr 

1 5 
Glu Asn Tyr Cys Asn 

20 



Ser He Cys Ser Leu Tyr Gin Leu 
10 ' 15 



<210> 86 

<211> 30 

<212> PRT 

<213> Homo Sapien 



<400> 86 

Phe Val Asn Gin His Leu Cys Gly 

1 5 
Leu Val Cys Gly Glu Arg Gly Phe 

20 



Ser His Leu Val Glu Ala Leu Tyr. 

10 15 
Phe Tyr Thr Pro Lys Thr 
25 30 



<210> 87 

<211> 51 

<212> PRT 

<213> Homo Sapien 



WO 03/092581 
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<400> 87 

Gly lie Val Glu Gin Cys Cys Thr 

1 5 
Glu Asn Tyr Cys Asn Phe Val Asn 

20 

Val Glu Ala Leu Tyr Leu Val Cys 

35 40 
Pro Lys Thr 
50 



Ser lie Cys Ser Leu Tyr Gin Leu 

10 " 15 
Gin His Leu Cys Gly Ser His Leu 
25 30 
Gly Glu Arg Gly Phe Phe Tyr Thr 

45 



<210> 88 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 88 

He Ala Arg Arg His Pro Tyr Phe Leu 
1 5 



<210> 89 

<211> 5 

<212> PRT 

<213> Homo Sapien 

<400> 89 

Tyr Gly Gly Phe Leu 
1 5 



<210> 90 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 9 

<221> MOI)_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 
<400> 90 

Xaa Gin Trp Ala Val Gly His Phe Met 
1 5 



<210> 91 

<211> 14 

<212> PRT 

<213> Homo Sapien 



<400> 91 

Arg Thr Lys Arg Ser Gly Ser Val Tyr Glu Pro Leu Lys He 
15 10 



<210> 92 



WO 03/092581 



20/32 
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<211> 5 

<212> PRT 

<213> Homo Sapien 

<400> 92 

Tyr Gly Gly Phe Met 
1 " 5 



<210> 93 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 9 

<400> 93 

Tyr Gly Gly Gly Phe Met Arg Arg Val 
1 " 5 



<210> 94 

<211> 22 

<212> PRT 

<213> Homo Sapien 



<400> 94 

Phe Val Pro lie Phe Thr Tyr Gly 

1 5 
Glu Arg Asn Lys Gly Gin 

20 



Glu Leu Gin Arg Met Gin Glu Lys 
10 ~ 15 



<210> 95 

<211> 9 

<212> PRT 

<213> Homo Sapien 



<400> 95 

Pro Met Ser Met Leu Arg Leu Asn His 
1 5 



<210> 96 

<211> 13 

<212> PRT 

<213> Homo Sapien 

<400> 96 

lie Pro Lys Lys Arg Ala Ala Arg Ala Thr Ser Asn Hxs 
1 5 10 



<210> 97 

<211> 6 

<212> PRT 

<213> Homo Sapien 



WO 03/092581 



21/32 



<400> 97 

Gly Ala Val Ser Thr Ala 
1 5 



<210> 98 

<211> 10 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 10 

<400> 98 

His Lys Thr Asp Ser Phe Val Gly Leu Met 
15 10 



<210> 99 

<211> 10 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 10 

<400> 99 

Asp Met His Asp Phe Phe Val Gly Leu Met 
1 " 5 10 



<210> 100 

<211> 10 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 10 

<400> 100 

Gly Asn Leu Trp Ala Thr Gly His Phe Met 
1 5 10 



<210> 101 

<211> 36 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 36 



<400> 101 



WO 03/092581 
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Tyr Pro Ser Lys Pro Asp Asn Pro Gly Glu Asp Ala Pro Ala Glu Asp 

1 5 10 15 

Met Ala Arg Tyr Tyr Ser Ala Lys Arg His Tyr He Asn Leu He Thr 



Arg Gin Arg Tyr 
. 35 



<210> 102 

<211> 12 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> MOD_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 

<400> 102 t ^ 
Xaa Leu Tyr Glu Asn Lys Pro Arg Arg Pro He Leu 
1 5 10 



<210> 103 

<211> 17 

<212> PRT 

<213> Homo Sapien 




Gin 



<210> 104 

<211> 31 

<212> PRT 

<213> Homo Sapien 



Phe Ala Glu Pro Leu Pro Ser Glu Glu Glu Gly Glu Ser Tyr Ser Lys 

1 5 10 15 

Glu Val Pro Glu Met Glu Lys Arg Tyr Gly Gly Phe Met Arg Phe 

20 25 30 



<210> 105 

<211> 6 

<212> PRT 

<213> Homo Sapien 

<400> 105 

Glu Gin Lys Gin Leu Gin 
1 ^ 5 



<210> 106 
<211> 33 
<212> PRT 



WO 03/092581 
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<213> Homo Sapien 
<220> 

<221> AMIDATION 
<222> 33 

<221> MOD_RES 

<222> 1 -a 
<223> Xaa is pyroglutamic acia 



Xaa^Leu Pro Asp Cys Cys Arg Gin Lys Thr Cys Ser Cys Arg Leu 
T yr Glu Leu Leu hL Gly Ala Gly Asn His Ala Ala Gly lie Leu Thr 



20 25 

Leu 



<210> 107 

<211> 28 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 28 

<400> 107 

Arg Ser Gly Pro Pro Gly Leu Gin 

1 5 
Ala Ser Gly Asn His Ala Ala Gly 

20 



Gly Arg Leu Gin Arg Leu Leu Gin 

10 15 
He Leu Thr Met 

25 



<210> 108 

<211> 49 

<212> PRT 

<213> Homo Sapien 



^°Lu°Tyr Gin Trp Leu Gly Ala Pro Val Pro Tyr Pro Asp Pro Leu 

Glu Pro Arg Arg Olu Val Cys Glu Leu Asn Pro Asp Cys Asp Glu Leu 

Ala Asp His I?e Gly Phe Gin Glu aL Tyr Arg Arg Pbe Tyr Gly Pro 
35 40 

Val 



<210> 109 

<211> 11 

<212> PRT 

<213> Homo Sapien 



<400> 109 _ „ T, . _ 

~, -ri rt m r> acn fH/c Pro Leu Glv ASA uis 

1 



Cys Tyr He Gin Asn Cys Pro Leu Gly ^ 
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<210> 110 

<211> 27 

<212> PRT 

<213> Homo Sapien 



<400> 110 

His Ser Asp Gly He Phe Thr Asp 

1 5 
Met Ala Val Lys Lys Tyr Leu Ala 

20 



Ser Tyr Ser Arg Tyr Arg Lys Gin 

10 I 5 
Ala Val Leu 
25 



<210> 111 

<211> 29 

<212> PRT 

<213> Homo Sapien 



<400> 111 

Asp Val Ala His Gly He Leu Asn 

1 5 
Gin Leu Ser Ala Gly Lys His Leu 

20 



Glu Ala Tyr Arg Lys Val Leu Asp 

10 1 5 
Gin Ser Leu Val Ala 

25 



<210> 112 

<211> 38 

<212> PRT 

<213> Homo Sapien 



^Pro^Leu Glu Pro Val Tyr Pro Gly Asp Asn Ala Thr Pro Glu Gin 

1 5 10 i. 

Met Ala Gin Tyr Ala Ala Asp Leu Arg Arg Tyr He Asn Met Leu Thr 

20 25 30 

Arg Pro Arg Tyr Asn His 

35 



<210> 113 

<2H> 4 

<212> PRT 

<213> Homo Sapien 

<400> 113 
Gly Gly Tyr Arg 
1 



<210> 114 

<211> 12 

<212> PRT 

<213> Homo Sapien 

<400> 114 ^ _ n _ 

Tyr Gly Gly Phe Met Arg Arg Val Gly Arg Pro Glu 

1 5 10 



<210> 115 
<211> 36 



WO 03/092581 
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<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 36 

T^°Pro 1 Ile Lys Pro Glu Ala Pro Gly Glu Asp Ala Ser Pro Glu Glu 

15 10 15 

Leu Asn Arg Tyr Tyr Ala Ser Leu Arg His Tyr Leu Asn Leu Val Thr 

20 25 30 

Arg Gin Arg Tyr 
35 



<210> 116 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 116 

Arg Arg Lys Ala Ser Gly Pro Pro Val 
1 5 



<210> 117 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 11 

<221> M0D_RES 
<222> 1 

<223> Xaa is pyroglutamic acid 
<400> 117 

Xaa Ala Asp Pro Asn Lys Phe Tyr Gly Leu Met 
1 5 10 



<210> 118 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 11 

<221> MOD_JRES 
<222> 1 

<223> Xaa is pyroglutamic acid 



<400> 118 



WO 03/092581 
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Xaa Val Pro Gin Trp Ala Val Gly His Phe Met 
1 5 10 



<210> 119 

<211> 5 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> UNSURE 
<222> 1,5 

<223> Xaa is a variable 
<400> 119 

Xaa Arg Gly Asp Xaa 
1 5 



<210> 120 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<400> 120 
Gly Gin Pro Arg 
1 



<210> 121 

<211> 13 

<212> PRT 

<213> Homo Sapien 



<400> 121 

Arq Arg Leu He Glu Asp Ala Glu Tyr Ala Ala Arg Gly 
X 9 5 10 



<210> 122 

<211> 5 

<212> PRT 

<213> Homo Sapien 



<400> 122 

Arg Pro Thr Val Leu 
1 5 



<210> 123 

<211> 27 

<212> PRT 

<213> Homo Sapien 



His Ser Asp Gly Thr Phe Thr Ser Glu Leu Ser Arg Leu Arg Glu Gly 

1 5 10 " 

Ala Arg Leu Gin Arg Leu Leu Gin Gly Leu Val 

20 25 



WO 03/092581 
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<210> 124 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> MOD_RES 

<222> 1 , 

<223> Xaa is pyroglutamic acid 

<400> 124 

Xaa Ala Lys Ser Gin Gly Gly Ser 
1 5 



<210> 125 

<211> 19 

<212> PRT 

<213> Homo Sapien 

Pr^Gln'cys Gly Lys Cys Arg He Cys Lys Asn Pro Glu Ser Asn Tyr 

1 5 10 

Cys Leu Lys 



<210> 126 

<211> 19 

<212> PRT 

<213> Homo Sapien 

;ro°Gln 2 ^ys Gly Lys Cys Arg Val Cys Lys Asn Pro Glu Ser Asn Tyr 

1 5 10 

Cys Leu Lys 



<210> 127 

<211> 19 

<212> PRT 

<213> Homo Sapien 

Pro°Gln 2 £y S Gly Lys Cys Arg He Cys Lys Asn Pro Glu Ser Asn Tyr 
Cys Leu Lys 



<210> 128 

<211> 19 

<212> PRT 

<213> Homo Sapien 



Pro°Lei 2 Cys Arg Lys Cys Lys Phe Cys Leu Ser Pro Leu Thr Asn Leu 

5 10 



1 

Cys Gly Lys 
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<210> 129 

<211> 18 

<212> PRT 

<213> Homo Sapien 

Pro°Gli 2 Sly Glu Cys Lys Phe Cys Leu Asn Pro Lys Thr Asn Leu Cys 

1 5 10 X * 

Gin Lys 



<210> 130 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> AMIDATION 
<222> 11 

<400> 130 

Arg Pro Lys Pro Gin Gin Phe Phe Gly Leu Met 



1 



5 10 



<210> 131 

<211> 15 

<212> PRT 

<213> Homo Sapien 

Pro°Leu 3 Ala Arg Thr Leu Ser Val Ala Gly Leu Pro Gly Lys Lys 
1 5 10 15 



<210> 132 

<211> 18 

<212> PRT 

<213> Homo Sapien 

<400> 132 

Ala Val Gin Ser Lys Pro Pro Ser 

1 5 
Thr Asp 



Lys Arg Asp Pro Pro Lys Met Gin 
10 15 



<210> 133 

<211> 36 

<212> PRT 

<213> Homo Sapien 



Thr Phe Gly Ser Gly Glu Ala Asp Cys Gly Leu Arg Pro Leu Phe Glu 

1 5 10 rr*. 

Lys Lys Ser Leu Glu Asp Lys Thr Glu Arg Glu Leu Leu Glu Ser Tyr 
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20 

He Asp Gly Arg 
35 



25 



30 



<210> 134 

<211> 5 

<212> PRT 

<213> Homo Sapien 

<400> 134 

Arg Lys Asp Val Tyr 
1 5 



<210> 135 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 135 

Gin Ala Lys Ser Gin Gly Gly Ser Asn 
1 5 



<210> 136 

<211> 3 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> MOD_RES 

<222> 1 , , j 

<223> Xaa is pyroglutamxc acid 

<400> 136 
Xaa His Pro 
1 



<210> 137 

<211> 4 

<212> PRT 

<213> Homo Sapien 

<400> 137 
Thr Lys Pro Arg 
1 



<210> 138 

<211> 11 

<212> PRT 

<213> Homo Sapien 

<220> 



<221> AMIDATION 
<222> 11 



WO 03/092581 
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<221> MOD_RES 

<222> 1 . 

<223> Xaa is pyroglutamic acid 

Xaa°Pro 3 Lp Pro Asn Ala Phe Tyr Gly Leu Met 
1 5 10 



<210> 139 

<211> 5 

<212> PRT 

<213> Homo Sapien 



<400> 139 

Asp Leu Trp Gin Lys 
1 5 



<210> 140 

<211> 40 

<212> PRT 

<213> Homo Sapien 



A sp°Asn 4 Pro Ser ,eu Ser He Asp Leu Thr Phe His Leu Leu Arg Thr 
Leu Leu Glu Leu aL Arg Thr Gin Ser Gin Arg Glu Arg Ala Glu Gin 



20 25 
Asn Arg lie lie Phe Asp Ser Val 
35 40 



<210> 141 

<211> 16 

<212> PRT 

<213> Homo Sapien 

<400> 141 



Asn Asp Asp Cys Glu Leu Cys Val Asn Val Ala Cys Thr Gly Cys Leu 



1 5 10 



<210> 142 

<211> 27 

<212> PRT 

<213> Homo Sapien 



G^Leuier Lys Gly Cys Phe Gly Leu Lys Leu Asp Arg He Gly Ser 

Met Ser Gly Leu Gly Cys Asn Ser Phe Arg Tyr 

20 25 



<210> 143 

<211> 9 

<212> PRT 

<213> Homo Sapien 



<400> 143 



WO 03/092581 
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Cys Tyr Phe Gin Asn Cys Pro Arg Gly 
1 5 



<210> 144 

<211> 9 

<212> PRT 

<213> Homo Sapien 

<400> 144 

Cys Tyr He Gin Asn Cys Pro Arg Gly 
1 5 



<210> 145 

<211> 28 

<212> PRT 

<213> Homo Sapien 

<400> 145 



Hls°Ser 4 Lp Ala Val Phe Thr Asp Asn Tyr Thr Arg Leu Arg Lys Gin 

Met Ala Val Lys Lys Tyr Leu Asn Ser He Leu Asn 

20 25 



<210> 146 

<211> 25 

<212> PRT 

<213> Homo Sapien 

Met°Li 4 Thr Lys Phe Glu Thr Lys Ser Ala Arg Val Lys Gly Leu Ser 

1 5 10 

Phe His Pro Lys Arg Pro Trp He Leu 

20 25 



<210> 147 

<211> 3 

<212> PRT 

<213> Homo Sapien 

<220> 

<221> UNSURE 
<222> 2 

<223> Xaa is a variable 

<400> 147 
Tyr Xaa Asn 
1 



<210> 148 

<2H> 9 

<212> PRT 

<213> Homo Sapien 

<400> 148 

Phe Gin Phe His Phe His Trp Gly Ser 



WO 03/092581 PCT/US02/22821 
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<210> 149 

<211> 11 

<212> PRT 

<213> Homo Sapien 

^e°ile 4 ?le Gin Phe His Phe His Trp Gly Ser 
1 5 10 
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